SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the inefficiency and poor reproducibility of subjective speech quality assessment (SSQA). To this end, we introduce SHEET, an open-source toolkit enabling end-to-end, data-driven training and evaluation of deep neural networks for accurate prediction of human-rated speech quality scores. SHEET establishes, for the first time, a unified multi-dataset (e.g., BVCC, NISQA) and multi-model framework, integrating plug-and-play pretrained self-supervised learning (SSL) models from Torch Hub and Hugging Face. Through systematic re-evaluation of SSL-MOS, we identify superior speech SSL representations: the best-performing SSL model surpasses the original SSL-MOS on both BVCC and NISQA benchmarks and matches state-of-the-art methods in performance. The toolkit is publicly deployed on Hugging Face Spaces, substantially lowering barriers to entry for SSQA research.

Technology Category

Application Category

📝 Abstract

We introduce SHEET, a multi-purpose open-source toolkit designed to accelerate subjective speech quality assessment (SSQA) research. SHEET stands for the Speech Human Evaluation Estimation Toolkit, which focuses on data-driven deep neural network-based models trained to predict human-labeled quality scores of speech samples. SHEET provides comprehensive training and evaluation scripts, multi-dataset and multi-model support, as well as pre-trained models accessible via Torch Hub and HuggingFace Spaces. To demonstrate its capabilities, we re-evaluated SSL-MOS, a speech self-supervised learning (SSL)-based SSQA model widely used in recent scientific papers, on an extensive list of speech SSL models. Experiments were conducted on two representative SSQA datasets named BVCC and NISQA, and we identified the optimal speech SSL model, whose performance surpassed the original SSL-MOS implementation and was comparable to state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Accelerate subjective speech quality assessment research

Predict human-labeled quality scores of speech samples

Identify optimal speech SSL model for quality evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source toolkit for speech quality assessment

Deep neural network-based quality score prediction

Pre-trained models via Torch Hub and HuggingFace

🔎 Similar Papers

MAD Speech: Measures of Acoustic Diversity of Speech

2024-04-16arXiv.orgCitations: 1