π€ AI Summary
This work identifies a novel acoustic side-channel vulnerability in high-performance optical mouse sensors: ambient audio induces minute desktop vibrations, which the sensor captures as raw pixel streams. An attacker can reconstruct speech in real time using only unprivileged user-space softwareβno kernel privileges or hardware modifications are required. To our knowledge, this is the first study to repurpose an optical mouse sensor as a covert eavesdropping device. We propose an end-to-end neural filtering framework that jointly integrates Wiener filtering, resampling correction, and an encoder-only spectrogram-based neural network to address non-uniform sampling and sensor nonlinearities. Experiments demonstrate a +19 dB improvement in speech signal-to-noise ratio under controlled conditions. On AudioMNIST and VCTK benchmarks, speech recognition accuracy reaches 42% and 61%, respectively, confirming the practical feasibility of cross-space, low-privilege acoustic eavesdropping.
π Abstract
Modern optical mouse sensors, with their advanced precision and high responsiveness, possess an often overlooked vulnerability: they can be exploited for side-channel attacks. This paper introduces Mic-E-Mouse, the first-ever side-channel attack that targets high-performance optical mouse sensors to covertly eavesdrop on users. We demonstrate that audio signals can induce subtle surface vibrations detectable by a mouse's optical sensor. Remarkably, user-space software on popular operating systems can collect and broadcast this sensitive side channel, granting attackers access to raw mouse data without requiring direct system-level permissions. Initially, the vibration signals extracted from mouse data are of poor quality due to non-uniform sampling, a non-linear frequency response, and significant quantization. To overcome these limitations, Mic-E-Mouse employs a sophisticated end-to-end data filtering pipeline that combines Wiener filtering, resampling corrections, and an innovative encoder-only spectrogram neural filtering technique. We evaluate the attack's efficacy across diverse conditions, including speaking volume, mouse polling rate and DPI, surface materials, speaker languages, and environmental noise. In controlled environments, Mic-E-Mouse improves the signal-to-noise ratio (SNR) by up to +19 dB for speech reconstruction. Furthermore, our results demonstrate a speech recognition accuracy of roughly 42% to 61% on the AudioMNIST and VCTK datasets. All our code and datasets are publicly accessible on https://sites.google.com/view/mic-e-mouse.