When Abundance Conceals Weakness: Knowledge Conflict in Multilingual Models

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the inconsistency between internal beliefs and external evidence in multilingual large language models when confronted with cross-lingual knowledge conflicts—a phenomenon understudied beyond English-centric settings. The authors propose CLEAR, a framework that systematically evaluates how models reconcile language-dependent memorized knowledge with multilingual external evidence across four progressively complex scenarios. They introduce ConflictQA and ConflictingQA, novel multilingual benchmarks spanning ten languages. Their analysis reveals, for the first time, the nuanced roles of language resource abundance and linguistic relatedness: in reasoning-intensive tasks, high-resource languages exert greater influence, whereas in entity-level factual conflicts, linguistic proximity dominates—enabling low-resource but closely related languages to outweigh high-resource yet distantly related ones.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) encode vast world knowledge across multiple languages, yet their internal beliefs are often unevenly distributed across linguistic spaces. When external evidence contradicts these language-dependent memories, models encounter \emph{cross-lingual knowledge conflict}, a phenomenon largely unexplored beyond English-centric settings. We introduce \textbf{CLEAR}, a \textbf{C}ross-\textbf{L}ingual knowl\textbf{E}dge conflict ev\textbf{A}luation f\textbf{R}amework that systematically examines how multilingual LLMs reconcile conflicting internal beliefs and multilingual external evidence. CLEAR decomposes conflict resolution into four progressive scenarios, from multilingual parametric elicitation to competitive multi-source cross-lingual induction, and systematically evaluates model behavior across two complementary QA benchmarks with distinct task characteristics. We construct multilingual versions of ConflictQA and ConflictingQA covering 10 typologically diverse languages and evaluate six representative LLMs. Our experiments reveal a task-dependent decision dichotomy. In reasoning-intensive tasks, conflict resolution is dominated by language resource abundance, with high-resource languages exerting stronger persuasive power. In contrast, for entity-centric factual conflicts, linguistic affinity, not resource scale, becomes decisive, allowing low-resource but linguistically aligned languages to outperform distant high-resource ones.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual knowledge conflict

multilingual LLMs

knowledge conflict

language resource abundance

linguistic affinity

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual knowledge conflict

multilingual LLMs

CLEAR framework