PRiSM: Benchmarking Phone Realization in Speech Models

📅 2026-01-20

📈 Citations: 1

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Current evaluations of phoneme recognition systems are largely confined to surface-level transcription accuracy, with limited insight into their underlying phonemic perception capabilities. This work proposes PRiSM—the first open-source comprehensive benchmark for phoneme recognition—which establishes a standardized evaluation framework integrating both intrinsic (representation probing) and extrinsic (downstream tasks across clinical, educational, and multilingual settings) assessments. Leveraging transcription-based metrics, multilingual datasets, and encoder-CTC architectures, the benchmark enables reproducible evaluation and reveals that multilingual training substantially enhances performance, encoder-CTC models exhibit the most consistent results, and specialized phoneme recognition models still outperform general-purpose large audio language models.

Technology Category

Application Category

📝 Abstract

Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis. Despite prolonged efforts in developing PR systems, current evaluations only measure surface-level transcription accuracy. We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception through intrinsic and extrinsic evaluation of PR systems. PRiSM standardizes transcription-based evaluation and assesses downstream utility in clinical, educational, and multilingual settings with transcription and representation probes. We find that diverse language exposure during training is key to PR performance, encoder-CTC models are the most stable, and specialized PR models still outperform Large Audio Language Models. PRiSM releases code, recipes, and datasets to move the field toward multilingual speech models with robust phonetic ability: https://github.com/changelinglab/prism.

Problem

Research questions and friction points this paper is trying to address.

Phone Recognition

Speech Models

Phonetic Perception

Multilingual Speech Processing

Benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Phone Recognition

Benchmarking

Multilingual Speech Modeling

Phonetic Evaluation

Audio Language Models

🔎 Similar Papers

No similar papers found.