BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning

📅 2024-06-25
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses two key challenges in cross-lingual knowledge editing: weak cross-lingual generalization and the absence of comprehensive evaluation benchmarks. To this end, we introduce BMIKE-53—the first multilingual contextual knowledge editing benchmark covering 53 languages—unifying zsRE, CounterFact, and WikiFactDiff under a consistent framework to systematically evaluate zero-, one-, and few-shot cross-lingual generalization. We further establish the first large-scale multilingual In-context Knowledge Editing (IKE) benchmark, uncovering critical factors influencing editing efficacy: model scale, demonstration alignment quality, and script type (e.g., Latin vs. non-Latin). We propose a metric-guided demonstration design methodology. Experiments show that larger models and language-aligned demonstrations significantly improve cross-lingual editing accuracy, whereas non-Latin scripts suffer performance degradation due to orthographic–phonetic ambiguity. This work provides an empirically grounded, reproducible benchmark and actionable insights for advancing multilingual knowledge editing.

Technology Category

Application Category

📝 Abstract
This paper introduces BMIKE-53, a comprehensive benchmark for cross-lingual in-context knowledge editing (IKE) across 53 languages, unifying three knowledge editing (KE) datasets: zsRE, CounterFact, and WikiFactDiff. Cross-lingual KE, which requires knowledge edited in one language to generalize across others while preserving unrelated knowledge, remains underexplored. To address this gap, we systematically evaluate IKE under zero-shot, one-shot, and few-shot setups, incorporating tailored metric-specific demonstrations. Our findings reveal that model scale and demonstration alignment critically govern cross-lingual IKE efficacy, with larger models and tailored demonstrations significantly improving performance. Linguistic properties, particularly script type, strongly influence performance variation across languages, with non-Latin languages underperforming due to issues like language confusion.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual knowledge editing generalization
Evaluation of in-context learning setups
Impact of linguistic properties on performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual knowledge editing benchmark
In-context learning evaluation setups
Tailored metric-specific demonstrations integration