SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Directional voice capture on smartphones remains challenging in noisy, reverberant environments. Method: This paper proposes a lightweight, end-to-end real-time solution comprising (i) a bio-inspired passive acoustic microstructure for direction encoding—enabling high-accuracy sound-source localization using only the two microphones in standard wired earphone controls—and (ii) a mobile-optimized lightweight neural network that decouples source separation from spatial focusing. Contribution/Results: The work introduces the first smartphone-compatible passive acoustic microstructure, requiring no additional hardware or power supply, yet outperforming conventional five-element microphone arrays with just two microphones. Experiments demonstrate a 5.0 dB SNR improvement within a 30° steering angle and real-time inference (<40 ms latency) on commercial devices including iPhone, significantly enhancing far-field speech intelligibility.

Technology Category

Application Category

📝 Abstract

Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30{deg} angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.

Problem

Research questions and friction points this paper is trying to address.

Enhancing directional speech capture in noisy environments

Extracting clear speech using passive acoustic microstructures

Improving smartphone microphone performance with minimal hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bio-inspired acoustic microstructure for directional cues

Passive design with no additional electronics

End-to-end neural network for real-time processing

🔎 Similar Papers

No similar papers found.