CLM-Bench: Benchmarking and Analyzing Cross-lingual Misalignment of LLMs in Knowledge Editing

πŸ“… 2026-01-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing cross-lingual knowledge editing evaluation benchmarks rely on machine translation, introducing artifacts and overlooking Chinese culture-specific entities, thereby failing to accurately reflect large language models’ cross-lingual knowledge alignment capabilities. This work proposes CLM-Bench, a novel benchmark constructed natively in Chinese, comprising 1,010 CounterFact-style factual pairs rooted in Chinese cultural contexts yet aligned with English counterparts. Through geometric analysis of layer representations and verification of vector space orthogonality, we reveal that monolingual edits fail to transfer across languages due to orthogonal editing vectors, while mixed-language edits exhibit linear superposition properties. Experiments on models such as Llama-3 and Qwen2 demonstrate that CLM-Bench effectively exposes cross-lingual knowledge misalignment, offering a new paradigm and theoretical insights for multilingual knowledge editing.

Technology Category

Application Category

πŸ“ Abstract
Knowledge Editing (KE) has emerged as a promising paradigm for updating facts in Large Language Models (LLMs) without retraining. However, progress in Multilingual Knowledge Editing (MKE) is currently hindered by biased evaluation frameworks. We observe that existing MKE benchmarks are typically constructed by mechanically translating English-centric datasets into target languages (e.g., English-to-Chinese). This approach introduces translation artifacts and neglects culturally specific entities native to the target language, failing to reflect the true knowledge distribution of LLMs. To address this, we propose CLM-Bench, a culture-aware benchmark constructed using a native Chinese-first methodology. We curate 1,010 high-quality CounterFact pairs rooted in Chinese cultural contexts and align them with English counterparts. Using CLM-Bench, we conduct extensive experiments on representative LLMs (e.g., Llama-3, Qwen2) and reveal a significant Cross-lingual Misalignment: edits in one language function independently and fail to propagate to the other. We further provide a geometric explanation via layer-wise representation analysis, demonstrating that edit vectors for Chinese and English are nearly orthogonal -- residing in disjoint subspaces -- while mixed-lingual editing exhibits linear additivity of these vectors. Our findings challenge the effectiveness of current methods in cross-lingual transfer and underscore the importance of culturally native benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual Misalignment
Knowledge Editing
Multilingual LLMs
Cultural Bias
Evaluation Benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Misalignment
Knowledge Editing
Culture-aware Benchmark
Multilingual LLMs
Representation Geometry
πŸ”Ž Similar Papers
No similar papers found.
Y
Yucheng Hu
Tianjin University, School of Future Technology
Wei Zhou
Wei Zhou
Huazhong University of Science and Technology
IoT SecuritySystem SecurityHardware Security
J
Juesi Xiao
Tianjin University, College of Intelligence and Computing