SincQDR-VAD: A Noise-Robust Voice Activity Detection Framework Leveraging Learnable Filters and Ranking-Aware Optimization

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address the poor robustness of voice activity detection (VAD) under noisy and resource-constrained conditions, and the misalignment between conventional classification losses and evaluation metrics such as AUROC, this paper proposes a compact, efficient end-to-end VAD framework. Methodologically: (i) a learnable Sinc bandpass filter is employed to construct a noise-robust spectral frontend, enhancing feature discriminability; (ii) a novel Quadratic Difference Ranking Loss is introduced to explicitly optimize the relative ranking of speech versus non-speech frames, thereby directly maximizing AUROC. Experiments on multiple benchmark datasets demonstrate consistent improvements—AUROC increases by 1.2–2.8% and F2-score by 3.5–5.1%—while the model requires only 69% of the parameters of current state-of-the-art methods. The proposed approach thus achieves superior accuracy, low inference latency, and high parameter efficiency.

Technology Category

Application Category

📝 Abstract

Voice activity detection (VAD) is essential for speech-driven applications, but remains far from perfect in noisy and resource-limited environments. Existing methods often lack robustness to noise, and their frame-wise classification losses are only loosely coupled with the evaluation metric of VAD. To address these challenges, we propose SincQDR-VAD, a compact and robust framework that combines a Sinc-extractor front-end with a novel quadratic disparity ranking loss. The Sinc-extractor uses learnable bandpass filters to capture noise-resistant spectral features, while the ranking loss optimizes the pairwise score order between speech and non-speech frames to improve the area under the receiver operating characteristic curve (AUROC). A series of experiments conducted on representative benchmark datasets show that our framework considerably improves both AUROC and F2-Score, while using only 69% of the parameters compared to prior arts, confirming its efficiency and practical viability.

Problem

Research questions and friction points this paper is trying to address.

Enhancing noise robustness in voice activity detection

Improving frame-wise classification with ranking-aware optimization

Reducing model parameters while maintaining performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable bandpass filters for noise-resistant features

Quadratic disparity ranking loss for AUROC optimization

Compact framework with reduced parameter usage

🔎 Similar Papers

No similar papers found.