Visual Attention Mapping
Overview
How Naturalistic Visuo-Auditory Cues Guide Human Attention
📅 Sep 2022 – Jan 2024
🏛️ University of Skövde, Sweden; Örebro University, Sweden; Constructor University, Germany
👥 Collaborators: Paul Hemeren, Erik Billing, Mehul Bhatt, Jakob Suchan
This project systematically investigated how visuospatial and auditory cues guide visual attention during passive observation of naturalistic human interactions. Building on cognitive vision principles, we developed a novel event model to analyze cue-attention dynamics in film-like narrative scenarios.
Aim
To characterize how five core visuoauditory cues modulate attention:
- Speaking (vocal turn-taking patterns)
- Gaze direction (reference/mutual/transition)
- Relative motion (directional vs. non-directional)
- Hand actions (pointing/gesturing/reaching)
- Visibility dynamics (entry/exit scenes)
Focused on quantifying both unimodal (individual) and multimodal (combined) cueing effects on attentional flow.
Methods
Structured Event Modeling Approach
- Developed a formal visuoauditory event model with three components:
- Scene elements (objects/regions)
- Scene structure (interaction modalities)
- Visual attention (eye-tracking metrics)
- Coded 27 narrative-driven interaction scenarios using ELAN annotation framework
Experimental Design
- Collected eye-tracking data from 90 participants (3 groups × 30)
- Between-subject design: Each group viewed one version of 9 scene contexts
- Eye movement metrics: Fixation duration, saccade magnitude, object-level attention
Stimuli & Data
Custom Naturalistic Stimuli Set
- 27 film-like scenarios depicting everyday interactions
- 9 core scene contexts × 3 variations (e.g., differing gaze/hand action patterns)
- Featured intentional actions aligned with narrative goals
- Public dataset: Annotations, eye-tracking data, and stimuli available for open science
Key Findings
- Speaking: Turn-taking triggered anticipatory gaze shifts to next speaker
- Gaze: Observers reliably followed gaze direction (attention shift to referenced agent)
- Hand Actions: Pointing/gesturing drew more attention than grasping/holding
- Visibility: Entry events captured immediate attention; exits triggered anticipatory shifts to remaining agents
- Motion + hand actions created synergistic cueing effects
- Speaking dominated attention when combined with other cues, while Gaze redirected attention toward referenced agents
Future Directions
- Investigate context-locked cues in narrative variations
- Model anticipatory gaze in event prediction
- Extend framework to active participation scenarios
- Develop probabilistic cue-interaction models
Project Outcomes
- Manuscript under peer review – 📚 See Publication
- Public dataset of annotated stimuli + eye-tracking data – 💾 Dataset Webpage
Collaboration Opportunities
Open to collaboration or discussion on methodology, dataset, or future directions. Happy to exchange ideas and explore new perspectives.