Uncovering inequalities in new knowledge learning by large language models across different languages

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work identifies systematic inequities in large language models’ (LLMs) multilingual continual learning of new knowledge. To address this, we propose the first dynamic fairness analytical framework, evaluating four critical dimensions—effectiveness, transferability, priority preservation, and robustness—thereby moving beyond conventional static capability assessments. Our methodology integrates in-context learning with parameter-efficient fine-tuning, enabling cross-lingual, multi-scenario empirical evaluation across both closed- and open-source LLMs. Results demonstrate that low-resource languages consistently underperform across all four dimensions, with inequities intensifying progressively throughout the continual learning process. This study provides the first systematic empirical evidence characterizing multilingual learning biases in LLMs. Furthermore, it offers actionable, technically grounded recommendations—spanning data curation, training scheduling, and architectural design—to foster more equitable and inclusive next-generation language models.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) gradually become integral tools for problem solving in daily life worldwide, understanding linguistic inequality is becoming increasingly important. Existing research has primarily focused on static analyses that assess the disparities in the existing knowledge and capabilities of LLMs across languages. However, LLMs are continuously evolving, acquiring new knowledge to generate up-to-date, domain-specific responses. Investigating linguistic inequalities within this dynamic process is, therefore, also essential. In this paper, we explore inequalities in new knowledge learning by LLMs across different languages and four key dimensions: effectiveness, transferability, prioritization, and robustness. Through extensive experiments under two settings (in-context learning and fine-tuning) using both proprietary and open-source models, we demonstrate that low-resource languages consistently face disadvantages across all four dimensions. By shedding light on these disparities, we aim to raise awareness of linguistic inequalities in LLMs' new knowledge learning, fostering the development of more inclusive and equitable future LLMs.

Problem

Research questions and friction points this paper is trying to address.

Inequalities in new knowledge learning by LLMs across languages.

Disparities in effectiveness, transferability, prioritization, and robustness.

Low-resource languages face disadvantages in LLM knowledge acquisition.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores LLM knowledge learning inequalities across languages

Uses in-context learning and fine-tuning experiments

Focuses on effectiveness, transferability, prioritization, robustness

🔎 Similar Papers

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis