ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Multilingual large language models (LLMs) commonly suffer from language confusion—generating responses inconsistent with the input prompt or user-specified language. To address this, we propose a lightweight language-guidance method that enables fine-grained language control via language-specific steering vectors. These vectors are unsupervisedly extracted from multilingual parallel corpora and integrated into the model’s hidden space through a hybrid mechanism combining fixed mappings with trainable projection functions. Our key contribution is the first demonstration of effective language-confusion mitigation without compromising task performance. Extensive experiments across 18 languages and three multilingual benchmarks show substantial reductions in language confusion rates while maintaining or improving task accuracy—demonstrating both efficacy and cross-lingual generalizability.

Technology Category

Application Category

📝 Abstract

As they become increasingly multilingual, Large Language Models (LLMs) exhibit more language confusion, i.e., they tend to generate answers in a language different from the language of the prompt or the answer language explicitly requested by the user. In this work, we propose ReCoVeR (REducing language COnfusion in VEctor Representations), a novel lightweight approach for reducing language confusion based on language-specific steering vectors. We first isolate language vectors with the help of multi-parallel corpus and then effectively leverage those vectors for effective LLM steering via fixed (i.e., unsupervised) as well as trainable steering functions. Our extensive evaluation, encompassing three benchmarks and 18 languages, shows that ReCoVeR effectively mitigates language confusion in both monolingual and cross-lingual setups while at the same time -- and in contrast to prior language steering methods -- retaining task performance. Our data code is available at https://github.com/hSterz/recover.

Problem

Research questions and friction points this paper is trying to address.

Reducing language confusion in multilingual LLMs

Maintaining task performance during language steering

Isolating language vectors for effective model control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-specific steering vectors reduce confusion

Fixed and trainable steering functions for LLMs

Retains task performance while mitigating language issues

🔎 Similar Papers

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models