Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

📅 2023-10-16
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 60
Influential: 5
📄 PDF
🤖 AI Summary
Multilingual pre-trained language models (PLMs) exhibit cross-lingual inconsistency (CLC) in factual knowledge—i.e., inconsistent factual predictions across languages despite identical prompts—posing challenges for reliable multilingual reasoning and editing. Method: We propose Ranking-based Consistency (RankC), an accuracy-agnostic metric that quantifies CLC by ranking model confidence scores over factually equivalent queries across languages. Using factual probing, cross-lingual comparative analysis, model editing, and knowledge migration tracking, we systematically investigate how model scale, language family, and training data influence CLC. Results: We find that increasing model size does not improve CLC; critically, the cross-lingual transferability of edited knowledge is strongly constrained by the RankC score of the source language. Experiments confirm RankC effectively predicts knowledge migratability to target languages. Our work establishes a new paradigm for modeling and controllably editing factual consistency in multilingual PLMs, providing both a theoretically grounded and interpretable evaluation framework.
📝 Abstract
Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.
Problem

Research questions and friction points this paper is trying to address.

Assess cross-lingual consistency of factual knowledge in multilingual PLMs
Propose RankC metric to evaluate knowledge consistency across languages
Study factors affecting cross-lingual consistency at model and language levels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes RankC metric for cross-lingual consistency evaluation
Analyzes factors affecting cross-lingual consistency in PLMs
Studies knowledge transfer via model editing in multilingual PLMs
🔎 Similar Papers
2024-06-20International Conference on Computational LinguisticsCitations: 2