MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models

📅 2024-04-07

🏛️ International Conference on Computational Linguistics

📈 Citations: 4

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing knowledge editing methods are predominantly evaluated in monolingual English settings, limiting assessment of their generalization to multilingual and cross-linguistic scenarios. Method: We introduce MLaKE, the first multilingual knowledge editing benchmark for large language models (LLMs), covering English, Chinese, Japanese, French, and German. It comprises 9,432 single- and multi-hop QA instances—including both generative and multiple-choice formats—constructed via collaborative alignment of multilingual Wikipedia fact chains and LLMs to ensure semantic consistency. Contribution/Results: MLaKE enables the first systematic evaluation of knowledge editing methods across languages and language families. Empirical analysis reveals substantial performance degradation on non-English and cross-linguistic tasks: current methods achieve peak accuracy on English, exhibit moderate transfer within language families, but show consistently weak generalization across language families. This work establishes a reproducible benchmark and identifies concrete directions for advancing multilingual knowledge editing.

Technology Category

Application Category

📝 Abstract

The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces MLaKE (Multilingual Language Knowledge Editing), a novel benchmark comprising 4072 multi-hop and 5360 single-hop questions designed to evaluate the adaptability of knowledge editing methods across five languages: English, Chinese, Japanese, French, and German. MLaKE aggregates fact chains from Wikipedia across languages and utilizes LLMs to generate questions in both free-form and multiple-choice. We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE. Existing knowledge editing methods demonstrate higher success rates in English samples compared to other languages. However, their generalization capabilities are limited in multi-language experiments. Notably, existing knowledge editing methods often show relatively high generalization for languages within the same language family compared to languages from different language families. These results underscore the imperative need for advancements in multilingual knowledge editing and we hope MLaKE can serve as a valuable resource for benchmarking and solution development.

Problem

Research questions and friction points this paper is trying to address.

Evaluates multilingual knowledge editing adaptability

Introduces MLaKE benchmark for language models

Highlights generalization challenges across language families

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual knowledge editing benchmark

Multi-hop reasoning evaluation

Cross-language generalization assessment

🔎 Similar Papers

Cross-Lingual Multi-Hop Knowledge Editing - Benchmarks, Analysis and a Simple Contrastive Learning based Approach