Window Size Versus Accuracy Experiments in Voice Activity Detectors

📅 2026-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the joint impact of window size and hysteresis mechanisms on the accuracy of voice activity detection (VAD). Focusing on three widely used VAD algorithms—Silero, WebRTC, and RMS—the authors evaluate their performance across diverse real-world audio streams under varying window configurations and with or without hysteresis. The experiments provide the first multi-algorithm, real-scenario quantification of how these two critical parameters influence VAD effectiveness. Results demonstrate that Silero consistently outperforms the other methods by a significant margin, while the incorporation of hysteresis markedly enhances the stability of WebRTC. These findings offer practical, empirically grounded guidelines for parameter tuning in real-world VAD system deployment.

Technology Category

Application Category

📝 Abstract
Voice activity detection (VAD) plays a vital role in enabling applications such as speech recognition. We analyze the impact of window size on the accuracy of three VAD algorithms: Silero, WebRTC, and Root Mean Square (RMS) across a set of diverse real-world digital audio streams. We additionally explore the use of hysteresis on top of each VAD output. Our results offer practical references for optimizing VAD systems. Silero significantly outperforms WebRTC and RMS, and hysteresis provides a benefit for WebRTC.
Problem

Research questions and friction points this paper is trying to address.

Voice Activity Detection
Window Size
Accuracy
Hysteresis
Speech Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voice Activity Detection
window size
hysteresis
Silero
accuracy optimization
🔎 Similar Papers
No similar papers found.