🤖 AI Summary
To address speaker overload caused by signal superposition when simultaneously performing audio playback and ultrasound sensing on smart devices, this paper proposes an audio-agnostic cognitive scaling mechanism: dynamically embedding sensing signals into the residual bandwidth of music—without clipping or global amplitude reduction—to enable high-fidelity, real-time acoustic sensing while preserving audio fidelity. The method employs a lightweight deep learning model supporting both sinusoidal and FMCW sensing waveforms, is compatible with arbitrary concurrent audio streams, and is deployable on edge devices. Experiments demonstrate that respiration monitoring and gesture recognition achieve accuracies approaching interference-free baselines. A user study confirms no perceptible degradation in audio quality, significantly outperforming existing approaches. This work presents the first solution enabling collaborative, adaptive allocation of frequency-domain resources between sensing and playback.
📝 Abstract
Acoustic sensing manifests great potential in various applications like health monitoring, gesture interface, by utilizing built-in speakers and microphones on smart devices. However, in ongoing research and development, one problem is often overlooked: the same speaker, when used concurrently for sensing and other traditional audio tasks (like playing music), could cause interference in both, making it impractical to use. The strong ultrasonic sensing signals mixed with music would overload the speaker’s mixer. To confront this issue of overloaded signals, current solutions are clipping or down-scaling, both of which affect the music playback quality, sensing range, and accuracy. To address this challenge, we propose CoPlay, a deep learning-based optimization algorithm to cognitively adapt the sensing signal and run in real-time. It can 1) maximize the sensing signal magnitude within the available bandwidth left by the concurrent music to optimize sensing range and accuracy and 2) minimize any consequential frequency distortion that can affect music playback. We design a custom model and test it on common types of sensing signals (sine wave or Frequency Modulated Continuous Wave FMCW) as inputs alongside various agnostic types of concurrent music and speech. First, we micro-benchmark the model performance to show the quality of the generated signals. Secondly, we conducted 2 field studies of downstream acoustic sensing tasks on 2 devices in the real world. A study with 12 users proved that respiration monitoring and gesture recognition using our adapted signal achieve similar accuracy as no-concurrent-music scenarios, whereas baseline methods of clipping or down-scaling manifest worse accuracy. A qualitative study also justifies that CoPlay leaves music untouched, unlike clipping or down-scaling that degrade music quality.