Improving Multilingual Language Models by Aligning Representations through Steering

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work investigates the hierarchical representation mechanisms underlying multilingual token processing in large language models (LLMs). Addressing the problem of cross-lingual representation misalignment, we propose single-layer representation steering: injecting learnable vectors into activations at a specific transformer layer to align representations across languages. Experiments demonstrate—for the first time—that intervening at only one layer suffices to substantially improve multilingual understanding and generation, achieving performance on par with translation-based baselines and significantly outperforming state-of-the-art prompt optimization methods. Further analysis reveals that supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) enhance multilingual capability primarily through implicit alignment of the internal representation space. Our approach establishes a lightweight, interpretable, translation-free, and parameter-efficient paradigm for multilingual capability enhancement—requiring no additional parameters or external translation modules.

Technology Category

Application Category

📝 Abstract

In this paper, we investigate how large language models (LLMS) process non-English tokens within their layer representations, an open question despite significant advancements in the field. Using representation steering, specifically by adding a learned vector to a single model layer's activations, we demonstrate that steering a single model layer can notably enhance performance. Our analysis shows that this approach achieves results comparable to translation baselines and surpasses state of the art prompt optimization methods. Additionally, we highlight how advanced techniques like supervised fine tuning ( extsc{sft}) and reinforcement learning from human feedback ( extsc{rlhf}) improve multilingual capabilities by altering representation spaces. We further illustrate how these methods align with our approach to reshaping LLMS layer representations.

Problem

Research questions and friction points this paper is trying to address.

Enhancing multilingual LLMs by aligning layer representations

Improving non-English token processing via representation steering

Surpassing translation baselines with single-layer activation adjustments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligning representations through learned vector steering

Enhancing performance via single-layer activation adjustments

Combining SFT and RLHF for multilingual representation alignment

🔎 Similar Papers

No similar papers found.