🤖 AI Summary
How do audiovisual cues—particularly lip movements—enhance speech comprehension and cortical tracking of continuous speech under naturalistic, noisy, and reverberant conditions?
Method: We recorded high-density EEG from participants listening to unscripted, ecologically valid speech in a virtual acoustic environment with simulated reverberation. Lip aperture was quantified from video, acoustic features (e.g., fundamental frequency, jitter) were extracted, and speech envelope–EEG correlations were computed to assess neural tracking fidelity.
Contribution/Results: This study provides the first demonstration of audiovisual gain using untrained speakers, spontaneous speech, and high-ecological-validity virtual acoustics. Under noise, audiovisual integration significantly improved cortical speech tracking accuracy (p < 0.001); occluding lips abolished this benefit, reducing performance to auditory-only levels—confirming the necessity of visual articulatory cues in degraded listening. Critically, speaker-specific acoustic and visual characteristics emerged as key modulators of multimodal integration, highlighting inter-individual variability in audiovisual speech processing.
📝 Abstract
The audio visual benefit in speech perception, where congruent visual input enhances auditory processing, is well documented across age groups, particularly in challenging listening conditions and among individuals with varying hearing abilities. However, most studies rely on highly controlled laboratory environments with scripted stimuli. Here, we examine the audio visual benefit using unscripted, natural speech from untrained speakers within a virtual acoustic environment. Using electroencephalography (EEG) and cortical speech tracking, we assessed neural responses across audio visual, audio only, visual only, and masked lip conditions to isolate the role of lip movements. Additionally, we analysed individual differences in acoustic and visual features of the speakers, including pitch, jitter, and lip openness, to explore their influence on the audio visual speech tracking benefit. Results showed a significant audio visual enhancement in speech tracking with background noise, with the masked lip condition performing similarly to the audio-only condition, emphasizing the importance of lip movements in adverse listening situations. Our findings reveal the feasibility of cortical speech tracking with naturalistic stimuli and underscore the impact of individual speaker characteristics on audio-visual integration in real world listening contexts.