🤖 AI Summary
This work addresses the challenge of self-voice detection in single-microphone hearing aids by proposing a hierarchical transfer learning approach based on acoustic transfer function simulation. A Transformer-based classifier is first pretrained on an analytical rigid-sphere model and then progressively fine-tuned on a high-fidelity head-and-torso model, significantly enhancing its generalization capability. The method requires neither additional microphones nor auxiliary sensors, achieving 95.52% accuracy on simulated test data—and 90.02% even with only one second of speech. Notably, without any fine-tuning on real-world data, it attains 80% accuracy on recordings from actual hearing aids, demonstrating strong cross-domain transferability from simulation to reality. This study thus establishes a novel paradigm for low-cost, high-accuracy self-voice detection in hearing assistance systems.
📝 Abstract
This paper presents a simulation-based approach to own voice detection (OVD) in hearing aids using a single microphone. While OVD can significantly improve user comfort and speech intelligibility, existing solutions often rely on multiple microphones or additional sensors, increasing device complexity and cost. To enable ML-based OVD without requiring costly transfer-function measurements, we propose a data augmentation strategy based on simulated acoustic transfer functions (ATFs) that expose the model to a wide range of spatial propagation conditions. A transformer-based classifier is first trained on analytically generated ATFs and then progressively fine-tuned using numerically simulated ATFs, transitioning from a rigid-sphere model to a detailed head-and-torso representation. This hierarchical adaptation enabled the model to refine its spatial understanding while maintaining generalization. Experimental results show 95.52% accuracy on simulated head-and-torso test data. Under short-duration conditions, the model maintained 90.02% accuracy with one-second utterances. On real hearing aid recordings, the model achieved 80% accuracy without fine-tuning, aided by lightweight test-time feature compensation. This highlights the model's ability to generalize from simulated to real-world conditions, demonstrating practical viability and pointing toward a promising direction for future hearing aid design.