🤖 AI Summary
This study addresses the challenge of automatic identification and separation of conversational partners in multi-speaker scenarios for hearing aids. We propose a prompt-free, on-device real-time active hearing assistance method that leverages the wearer’s self-speech as an acoustic anchor, models turn-taking behavior, and integrates binaural spatial cues. Our approach employs a lightweight streaming model for ultra-low-latency online processing, synergistically combined with a slower, context-aware model to capture long-term conversational dynamics—forming a dual-timescale architecture. Evaluated on 6.8 hours of real-world dyadic and triadic dialogue data from 11 participants, the method significantly improves target speech separation accuracy and speech intelligibility. It enables efficient on-device deployment and demonstrates robust generalization across diverse acoustic environments. To our knowledge, this is the first work achieving prompt-free conversational speech separation via self-speech anchoring and explicit turn-taking modeling.
📝 Abstract
We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/