CoCo-CoLa: Evaluating Language Adherence in Multilingual LLMs

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Multilingual large language models (LLMs) frequently deviate from the target language—especially for low-resource languages—undermining language adherence. This work introduces CoCo-CoLa, a novel evaluation framework that, for the first time, identifies output layers and other language-specific components as primary drivers of language selection bias. Building on this insight, we propose a selective fine-tuning strategy that updates only these critical layers, drastically reducing computational overhead while preserving or enhancing performance. Extensive evaluation across closed-book question answering benchmarks, multilingual fine-tuning, and layer-wise interpretability analysis demonstrates that our method achieves language consistency on par with or superior to full-parameter fine-tuning across seven languages—with particularly pronounced gains for low-resource ones. Our core contributions are: (1) establishing an interpretable mechanism for language selection bias; and (2) introducing an efficient, lightweight paradigm for optimizing language consistency.

Technology Category

Application Category

📝 Abstract

Multilingual Large Language Models (LLMs) develop cross-lingual abilities despite being trained on limited parallel data. However, they often struggle to generate responses in the intended language, favoring high-resource languages such as English. In this work, we introduce CoCo-CoLa (Correct Concept - Correct Language), a novel metric to evaluate language adherence in multilingual LLMs. Using fine-tuning experiments on a closed-book QA task across seven languages, we analyze how training in one language affects others' performance. Our findings reveal that multilingual models share task knowledge across languages but exhibit biases in the selection of output language. We identify language-specific layers, showing that final layers play a crucial role in determining output language. Accordingly, we propose a partial training strategy that selectively fine-tunes key layers, improving language adherence while significantly reducing computational cost. Our method achieves comparable or superior performance to full fine-tuning, particularly for low-resource languages, offering a more efficient multilingual adaptation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating language adherence in multilingual LLMs

Addressing output language biases in multilingual models

Proposing efficient fine-tuning for low-resource languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel metric for language adherence

Selective fine-tuning of key layers

Improved multilingual model efficiency

🔎 Similar Papers

Understanding and Mitigating Language Confusion in LLMs