Assessment of L2 Oral Proficiency using Speech Large Language Models

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This study addresses critical limitations in traditional L2 spoken-language automatic scoring—namely, loss of phonetic information and poor generalization of end-to-end models—by proposing the first multimodal Speech Large Language Model (Speech-LLM)-based framework for oral proficiency assessment. Methodologically, it integrates audio-understanding pretraining with dual-objective fine-tuning (regression and classification) and introduces a cross-domain evaluation mechanism to ensure robust generalization across speakers and tasks. Key contributions include: (1) the first application of Speech-LLMs to L2 speaking assessment; (2) overcoming inherent constraints of cascaded and end-to-end paradigms by preserving fine-grained phonetic and semantic information; and (3) achieving state-of-the-art performance on two major benchmark datasets, significantly outperforming statistical models, text encoders, and self-supervised speech models in both scoring accuracy and robustness.

Technology Category

Application Category

📝 Abstract

The growing population of L2 English speakers has increased the demand for developing automatic graders for spoken language assessment (SLA). Historically, statistical models, text encoders, and self-supervised speech models have been utilised for this task. However, cascaded systems suffer from the loss of information, while E2E graders also have limitations. With the recent advancements of multi-modal large language models (LLMs), we aim to explore their potential as L2 oral proficiency graders and overcome these issues. In this work, we compare various training strategies using regression and classification targets. Our results show that speech LLMs outperform all previous competitive baselines, achieving superior performance on two datasets. Furthermore, the trained grader demonstrates strong generalisation capabilities in the cross-part or cross-task evaluation, facilitated by the audio understanding knowledge acquired during LLM pre-training.

Problem

Research questions and friction points this paper is trying to address.

Develop automatic graders for L2 spoken language assessment

Overcome limitations of cascaded systems and E2E graders

Explore multi-modal LLMs as L2 oral proficiency graders

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using multi-modal LLMs for L2 oral assessment

Comparing regression and classification training strategies

Leveraging LLM pre-training for generalization

🔎 Similar Papers

No similar papers found.