Multi-Channel Replay Speech Detection using Acoustic Maps

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a lightweight replay attack detection method for automatic speaker verification systems in real-time voice assistants, addressing their vulnerability to replay attacks. The approach leverages multi-channel recordings to construct physically interpretable spatial energy distribution maps via conventional beamforming over a discrete azimuth-elevation grid, effectively capturing the distinct radiation characteristics between human speech and loudspeaker-replayed audio. A compact convolutional neural network with only approximately 6,000 trainable parameters is then designed to perform replay detection based on these acoustic maps. Experimental results on the ReMASC dataset demonstrate that the proposed method achieves competitive performance while maintaining strong robustness across diverse devices and acoustic environments, highlighting its efficiency and practicality.

Technology Category

Application Category

📝 Abstract
Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.
Problem

Research questions and friction points this paper is trying to address.

replay attack
speaker verification
multi-channel
speech detection
voice assistant
Innovation

Methods, ideas, or system contributions that make the work stand out.

acoustic maps
replay speech detection
multi-channel
beamforming
lightweight CNN
🔎 Similar Papers
No similar papers found.
M
Michael Neri
Faculty of Information Technology and Commmunication Sciences, Tampere University, Tampere, Finland
Tuomas Virtanen
Tuomas Virtanen
Tampere University
machine listeningaudio signal processingaudio