Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Multilingual large language models exhibit significantly weaker factual recall in non-English languages than in English, primarily due to reliance on English-centric reasoning pathways and back-translation mechanisms. This work is the first to systematically uncover this phenomenon at the mechanistic level. We propose two language-agnostic and data-agnostic vector intervention methods, grounded in mechanistic interpretability analysis, to localize cross-lingual representational biases and directly modify internal attention patterns and activation trajectories. Our interventions operate without fine-tuning or additional training, enabling targeted correction of intermediate-layer representations. Crucially, they preserve English performance while boosting factual recall accuracy in the lowest-performing language by over 35%, substantially narrowing the English–non-English performance gap. To our knowledge, this is the first approach to achieve mechanism-level repair of cross-lingual factual consistency.

Technology Category

Application Category

📝 Abstract

Multilingual large language models (LLMs) often exhibit factual inconsistencies across languages, with significantly better performance in factual recall tasks in English than in other languages. The causes of these failures, however, remain poorly understood. Using mechanistic analysis techniques, we uncover the underlying pipeline that LLMs employ, which involves using the English-centric factual recall mechanism to process multilingual queries and then translating English answers back into the target language. We identify two primary sources of error: insufficient engagement of the reliable English-centric mechanism for factual recall, and incorrect translation from English back into the target language for the final answer. To address these vulnerabilities, we introduce two vector interventions, both independent of languages and datasets, to redirect the model toward better internal paths for higher factual consistency. Our interventions combined increase the recall accuracy by over 35 percent for the lowest-performing language. Our findings demonstrate how mechanistic insights can be used to unlock latent multilingual capabilities in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Multilingual LLMs show inconsistent factual recall across languages

Errors stem from English-centric recall and faulty translation

Vector interventions improve multilingual factual consistency by 35%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mechanistic analysis reveals English-centric recall pipeline

Vector interventions improve multilingual factual consistency

Language-independent methods boost recall accuracy significantly

🔎 Similar Papers

Selected Languages are All You Need for Cross-lingual Truthfulness Transfer