CLM-Bench: Benchmarking and Analyzing Cross-lingual Misalignment of LLMs in Knowledge Editing

📅 2026-01-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing cross-lingual knowledge editing evaluation benchmarks rely on machine translation, introducing artifacts and overlooking Chinese culture-specific entities, thereby failing to accurately reflect large language models’ cross-lingual knowledge alignment capabilities. This work proposes CLM-Bench, a novel benchmark constructed natively in Chinese, comprising 1,010 CounterFact-style factual pairs rooted in Chinese cultural contexts yet aligned with English counterparts. Through geometric analysis of layer representations and verification of vector space orthogonality, we reveal that monolingual edits fail to transfer across languages due to orthogonal editing vectors, while mixed-language edits exhibit linear superposition properties. Experiments on models such as Llama-3 and Qwen2 demonstrate that CLM-Bench effectively exposes cross-lingual knowledge misalignment, offering a new paradigm and theoretical insights for multilingual knowledge editing.

Technology Category

Application Category

📝 Abstract

Knowledge Editing (KE) has emerged as a promising paradigm for updating facts in Large Language Models (LLMs) without retraining. However, progress in Multilingual Knowledge Editing (MKE) is currently hindered by biased evaluation frameworks. We observe that existing MKE benchmarks are typically constructed by mechanically translating English-centric datasets into target languages (e.g., English-to-Chinese). This approach introduces translation artifacts and neglects culturally specific entities native to the target language, failing to reflect the true knowledge distribution of LLMs. To address this, we propose CLM-Bench, a culture-aware benchmark constructed using a native Chinese-first methodology. We curate 1,010 high-quality CounterFact pairs rooted in Chinese cultural contexts and align them with English counterparts. Using CLM-Bench, we conduct extensive experiments on representative LLMs (e.g., Llama-3, Qwen2) and reveal a significant Cross-lingual Misalignment: edits in one language function independently and fail to propagate to the other. We further provide a geometric explanation via layer-wise representation analysis, demonstrating that edit vectors for Chinese and English are nearly orthogonal -- residing in disjoint subspaces -- while mixed-lingual editing exhibits linear additivity of these vectors. Our findings challenge the effectiveness of current methods in cross-lingual transfer and underscore the importance of culturally native benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Cross-lingual Misalignment

Knowledge Editing

Multilingual LLMs

Cultural Bias

Evaluation Benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Misalignment

Knowledge Editing

Culture-aware Benchmark

Multilingual LLMs

Representation Geometry

🔎 Similar Papers

No similar papers found.

Authors to Follow