SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Directional voice capture on smartphones remains challenging in noisy, reverberant environments. Method: This paper proposes a lightweight, end-to-end real-time solution comprising (i) a bio-inspired passive acoustic microstructure for direction encoding—enabling high-accuracy sound-source localization using only the two microphones in standard wired earphone controls—and (ii) a mobile-optimized lightweight neural network that decouples source separation from spatial focusing. Contribution/Results: The work introduces the first smartphone-compatible passive acoustic microstructure, requiring no additional hardware or power supply, yet outperforming conventional five-element microphone arrays with just two microphones. Experiments demonstrate a 5.0 dB SNR improvement within a 30° steering angle and real-time inference (<40 ms latency) on commercial devices including iPhone, significantly enhancing far-field speech intelligibility.

Technology Category

Application Category

📝 Abstract
Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30{deg} angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.
Problem

Research questions and friction points this paper is trying to address.

Enhancing directional speech capture in noisy environments
Extracting clear speech using passive acoustic microstructures
Improving smartphone microphone performance with minimal hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bio-inspired acoustic microstructure for directional cues
Passive design with no additional electronics
End-to-end neural network for real-time processing
🔎 Similar Papers
No similar papers found.