ERSAM: Neural Architecture Search for Energy-Efficient and Real-Time Social Ambiance Measurement

📅 2023-03-19
🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of privacy sensitivity, scarce labeled data, and hardware constraints in mobile social atmosphere measurement (SAM), this paper proposes the first lightweight neural architecture search (NAS) framework specifically designed for SAM. The framework integrates weight-sharing NAS, hardware-aware search space design, a compact temporal audio feature extraction network, and a few-shot adaptive training strategy to jointly optimize accuracy, energy efficiency, and inference latency. On a Pixel 3 smartphone, it processes 5-second audio clips in just 0.05 seconds with only 40 mW average power consumption over 12 hours, achieving a 14.3% error rate—substantially outperforming existing methods. Its core contribution lies in advancing the Pareto frontier of accuracy versus energy efficiency for mobile deep neural networks, enabling high-accuracy, low-power, real-time concurrent speaker counting. This provides a deployable solution for data-constrained clinical and edge applications.
📝 Abstract
Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW • 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.
Problem

Research questions and friction points this paper is trying to address.

Develop energy-efficient neural networks for real-time social ambiance measurement
Overcome limited labeled data challenges in clinical SAM settings
Balance accuracy and hardware efficiency for on-device SAM solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural architecture search for efficient SAM
Automated DNN optimization for mobile devices
Balances accuracy and hardware efficiency
🔎 Similar Papers
No similar papers found.