A Suite for Acoustic Language Model Evaluation

📅 2024-09-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current speech-language models (SLMs) lack systematic evaluation tools for non-semantic acoustic attributes—such as background noise, speaker identity, emotion, and room impulse response—and their interplay with textual content. To address this, we introduce SALMon, the first unified benchmarking suite explicitly designed to assess SLMs across diverse non-semantic acoustic dimensions. SALMon employs discriminative scoring instead of generative sampling, significantly improving evaluation efficiency and consistency. It integrates a high-quality, expert-annotated acoustic attribute dataset with multi-dimensional, controllable acoustic perturbation injection to jointly evaluate both acoustic attribute fidelity and text-acoustic alignment. We conduct comprehensive benchmarking across state-of-the-art SLMs, precisely characterizing their capabilities and limitations along each acoustic dimension. All code and data are publicly released to foster reproducible research and community advancement.

Technology Category

Application Category

📝 Abstract
Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc. Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking. To help bridge this gap, we introduce SALMon, a novel evaluation suite encompassing background noise, emotion, speaker identity and room impulse response. The proposed benchmarks both evaluate the consistency of the inspected element and how much it matches the spoken text. We follow a modelling based approach, measuring whether a model gives correct samples higher scores than incorrect ones. This approach makes the benchmark fast to compute even for large models. We evaluated several speech language models on SALMon, thus highlighting the strengths and weaknesses of each evaluated method. We make the code and data publicly available at https://pages.cs.huji.ac.il/adiyoss-lab/salmon/ .
Problem

Research questions and friction points this paper is trying to address.

Speech Recognition Evaluation
Background Noise Impact
Speaker-specific Attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

SALMon toolkit
comprehensive evaluation
speech understanding
🔎 Similar Papers
No similar papers found.