🤖 AI Summary
This work proposes a lightweight, multi-output audio separation network designed for edge deployment, addressing the limitation of existing hearable devices that support only global noise suppression or single-target focus. By integrating 6-millisecond streaming processing, cross-platform model optimization, and a dynamically activated, on-demand interaction interface, the system enables real-time identification and fine-grained volume control of up to five concurrent sound classes on resource-constrained hardware. This approach transforms complex acoustic scenes into programmable multi-track audio streams, empowering users to remix their auditory environment akin to professional audio engineers. Evaluated in unseen real-world indoor and outdoor scenarios, the method demonstrates significantly improved target sound enhancement and interference suppression, while maintaining low latency and high robustness.
📝 Abstract
Hearables are becoming ubiquitous, yet their sound controls remain blunt: users can either enable global noise suppression or focus on a single target sound. Real-world acoustic scenes, however, contain many simultaneous sources that users may want to adjust independently. We introduce Aurchestra, the first system to provide fine-grained, real-time soundscape control on resource-constrained hearables. Our system has two key components: (1) a dynamic interface that surfaces only active sound classes and (2) a real-time, on-device multi-output extraction network that generates separate streams for each selected class, achieving robust performance for upto 5 overlapping target sounds, and letting users mix their environment by customizing per-class volumes, much like an audio engineer mixes tracks. We optimize the model architecture for multiple compute-limited platforms and demonstrate real-time performance on 6 ms streaming audio chunks. Across real-world environments in previously unseen indoor and outdoor scenarios, our system enables expressive per-class sound control and achieves substantial improvements in target-class enhancement and interference suppression. Our results show that the world need not be heard as a single, undifferentiated stream: with Aurchestra, the soundscape becomes truly programmable.