🤖 AI Summary
The audio/speech coding community has long lacked a unified, open-source, and reproducible evaluation benchmark; existing evaluations rely on proprietary or small-scale datasets, leading to unfair and non-reproducible comparisons between traditional DSP-based and machine learning (ML)-based codecs.
Method: We introduce AudioBench—the first open-source, full-bandwidth, content-diverse audio coding quality benchmark—featuring standardized test vectors, support for emerging scenarios (e.g., emotional speech, LE Audio with LC3/LC3+), and integration of mainstream codecs (Opus, EVS, LC3) with objective metrics (PESQ, VISQOL) and subjective listening tests.
Contribution/Results: AudioBench enables the first fair, standardized comparison between DSP and ML codecs. Experiments uncover significant quality degradation in emotional speech coding at 16 kbps. The benchmark is publicly released, widely adopted by the community, and facilitates robust cross-algorithm and cross-distribution evaluation.
📝 Abstract
Audio and speech coding lack unified evaluation and open-source testing. Many candidate systems were evaluated on proprietary, non-reproducible, or small data, and machine learning-based codecs are often tested on datasets with similar distributions as trained on, which is unfairly compared to digital signal processing-based codecs that usually work well with unseen data. This paper presents a full-band audio and speech coding quality benchmark with more variable content types, including traditional open test vectors. An example use case of audio coding quality assessment is presented with open-source Opus, 3GPP's EVS, and recent ETSI's LC3 with LC3+ used in Bluetooth LE Audio profiles. Besides, quality variations of emotional speech encoding at 16 kbps are shown. The proposed open-source benchmark contributes to audio and speech coding democratization and is available at https://github.com/JozefColdenhoff/OpenACE.