ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multilingual large language models (LLMs) commonly suffer from language confusion—generating responses inconsistent with the input prompt or user-specified language. To address this, we propose a lightweight language-guidance method that enables fine-grained language control via language-specific steering vectors. These vectors are unsupervisedly extracted from multilingual parallel corpora and integrated into the model’s hidden space through a hybrid mechanism combining fixed mappings with trainable projection functions. Our key contribution is the first demonstration of effective language-confusion mitigation without compromising task performance. Extensive experiments across 18 languages and three multilingual benchmarks show substantial reductions in language confusion rates while maintaining or improving task accuracy—demonstrating both efficacy and cross-lingual generalizability.

Technology Category

Application Category

📝 Abstract
As they become increasingly multilingual, Large Language Models (LLMs) exhibit more language confusion, i.e., they tend to generate answers in a language different from the language of the prompt or the answer language explicitly requested by the user. In this work, we propose ReCoVeR (REducing language COnfusion in VEctor Representations), a novel lightweight approach for reducing language confusion based on language-specific steering vectors. We first isolate language vectors with the help of multi-parallel corpus and then effectively leverage those vectors for effective LLM steering via fixed (i.e., unsupervised) as well as trainable steering functions. Our extensive evaluation, encompassing three benchmarks and 18 languages, shows that ReCoVeR effectively mitigates language confusion in both monolingual and cross-lingual setups while at the same time -- and in contrast to prior language steering methods -- retaining task performance. Our data code is available at https://github.com/hSterz/recover.
Problem

Research questions and friction points this paper is trying to address.

Reducing language confusion in multilingual LLMs
Maintaining task performance during language steering
Isolating language vectors for effective model control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-specific steering vectors reduce confusion
Fixed and trainable steering functions for LLMs
Retains task performance while mitigating language issues
🔎 Similar Papers
No similar papers found.