CLaS-Bench: A Cross-Lingual Alignment and Steering Benchmark

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of standardized benchmarks for evaluating the efficacy of internal representation steering in multilingual reasoning within large language models. The authors propose CLaS-Bench, a lightweight parallel question benchmark spanning 32 languages, which establishes the first systematic protocol for multilingual steering evaluation and introduces a harmonic mean score that jointly accounts for language control and semantic relevance. Through comprehensive experiments employing diverse steering methods—including DiffMean on residual streams, probing directions, language-specific neurons, PCA/LDA, sparse autoencoders, and prompting—the study demonstrates that DiffMean consistently achieves the best performance across all languages. Analyses further reveal that steering directions cluster by language family and that language-specific structures predominantly emerge in the model’s later layers, offering a promising pathway for adapting models to low-resource languages.

Technology Category

Application Category

📝 Abstract
Understanding and controlling the behavior of large language models (LLMs) is an increasingly important topic in multilingual NLP. Beyond prompting or fine-tuning, , i.e.,~manipulating internal representations during inference, has emerged as a more efficient and interpretable technique for adapting models to a target language. Yet, no dedicated benchmarks or evaluation protocols exist to quantify the effectiveness of steering techniques. We introduce CLaS-Bench, a lightweight parallel-question benchmark for evaluating language-forcing behavior in LLMs across 32 languages, enabling systematic evaluation of multilingual steering methods. We evaluate a broad array of steering techniques, including residual-stream DiffMean interventions, probe-derived directions, language-specific neurons, PCA/LDA vectors, Sparse Autoencoders, and prompting baselines. Steering performance is measured along two axes: language control and semantic relevance, combined into a single harmonic-mean steering score. We find that across languages simple residual-based DiffMean method consistently outperforms all other methods. Moreover, a layer-wise analysis reveals that language-specific structure emerges predominantly in later layers and steering directions cluster based on language family. CLaS-Bench is the first standardized benchmark for multilingual steering, enabling both rigorous scientific analysis of language representations and practical evaluation of steering as a low-cost adaptation alternative.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual alignment
steering
large language models
multilingual NLP
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual steering
CLaS-Bench
DiffMean
language-specific representations
multilingual alignment
🔎 Similar Papers
No similar papers found.
D
Daniil Gurgurov
Saarland University, German Research Center for Artificial Intelligence (DFKI)
Y
Yusser Al Ghussin
Saarland University, German Research Center for Artificial Intelligence (DFKI)
T
Tanja Bäumel
Saarland University, German Research Center for Artificial Intelligence (DFKI), Centre for European Research in Trusted AI (CERTAIN)
C
Cheng-Ting Chou
University of Illinois Urbana-Champaign
P
P. Schramowski
German Research Center for Artificial Intelligence (DFKI), TU Darmstadt, hessian.AI
Marius Mosbach
Marius Mosbach
Mila - Quebec AI Institute, McGill University
NLPInterpretabilityMachine learning
Josef van Genabith
Josef van Genabith
DFKI German Research Center for Artificial Intelligence, Saarland University
Natural Language ProcessingMachine TranslationComputational LinguisticsComputational Semantics
Simon Ostermann
Simon Ostermann
Research Group Lead & Deputy Director at German Research Center for Artificial Intelligence (DFKI)
Large Language ModelsExplainable AIMechanistic InterpretabilityEfficient NLPLow Resource NLP