Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the inefficiency of current large language model (LLM)-based automatic speech recognition (ASR) systems, which train separate connectors for each language while disregarding linguistic phylogenetic relationships, leading to parameter redundancy and limited generalization. To overcome this, the study introduces a novel approach that incorporates language family information into the design of LLM-ASR connectors for the first time. It proposes a lightweight, language-family-shared connector that enables knowledge transfer across multiple languages within the same family, situated between a frozen speech encoder and a pretrained LLM. The method substantially reduces model parameters while demonstrating improved cross-lingual recognition performance on two multilingual LLMs and real-world speech corpora, achieving both deployment efficiency and enhanced generalization.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-powered Automatic Speech Recognition (ASR) systems achieve strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. Prior work trains a separate connector per language, overlooking linguistic relatedness. We propose an efficient and novel connector-sharing strategy based on linguistic family membership, enabling one connector per family, and empirically validate its effectiveness across two multilingual LLMs and two real-world corpora spanning curated and crowd-sourced speech. Our results show that family-based connectors reduce parameter count while improving generalization across domains, offering a practical and scalable strategy for multilingual ASR deployment.

Problem

Research questions and friction points this paper is trying to address.

Language Family

LLM-based ASR

Multilingual ASR

Connector Sharing

Linguistic Relatedness

Innovation

Methods, ideas, or system contributions that make the work stand out.

language family

connector sharing

multilingual ASR