A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

📅 2024-05-17
🏛️ arXiv.org
📈 Citations: 10
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses language unfairness and performance imbalance in multilingual large language models (LLMs). To tackle this, it establishes the first multidimensional analytical framework anchored in linguistic fairness, integrating model capability, cultural adaptation, and safety governance. Through a systematic review of key research directions—including pretraining evolution, multilingual alignment, cross-lingual retrieval, culture-aware fine-tuning, and multilingual benchmarking—the work identifies twelve recurring challenges. It further constructs a knowledge graph spanning five core dimensions: training, inference, retrieval, safety, and evaluation. Additionally, it synthesizes six frontier strategies, including low-resource language enhancement and culturally adaptive inference. The framework provides both theoretical foundations and practical guidelines for paradigmatic transformation in multilingual LLM research, advancing equitable, robust, and context-sensitive language modeling across diverse linguistic and cultural contexts.

Technology Category

Application Category

📝 Abstract
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing, attracting global attention in both academia and industry. To mitigate potential discrimination and enhance the overall usability and accessibility for diverse language user groups, it is important for the development of language-fair technology. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient, where a comprehensive survey to summarize recent approaches, developments, limitations, and potential solutions is desirable. To this end, we provide a survey with multiple perspectives on the utilization of LLMs in the multilingual scenario. We first rethink the transitions between previous and current research on pre-trained language models. Then we introduce several perspectives on the multilingualism of LLMs, including training and inference methods, information retrieval, model security, multi-domain with language culture, and usage of datasets. We also discuss the major challenges that arise in these aspects, along with possible solutions. Besides, we highlight future research directions that aim at further enhancing LLMs with multilingualism. The survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
Problem

Research questions and friction points this paper is trying to address.

Multilingual Environment
Fairness
Performance Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual Processing
Fairness in LLMs
Cross-cultural Adaptability
🔎 Similar Papers
No similar papers found.
K
Kaiyu Huang
Beijing Jiaotong University, China
Fengran Mo
Fengran Mo
Ph.D. Student, Université de Montréal
Conversational AIInformation RetrievalNatural Language ProcessingMultilingualism
H
Hongliang Li
Beijing Jiaotong University, China
Y
You Li
Beijing Jiaotong University, China
Y
Yuan Zhang
W
Weijian Yi
Beijing Jiaotong University, China
Y
Yulong Mao
Beijing Jiaotong University, China
J
Jinchen Liu
Beijing Jiaotong University, China
Yuzhuang Xu
Yuzhuang Xu
Tsinghua University
Natural Language ProcessingEfficient AIMachine Learning
Jinan Xu
Jinan Xu
Professor of School of Computer and Information Technology, Beijing Jiaotong University
NLPMachine TranslationLLM
Jian-Yun Nie
Jian-Yun Nie
university of montreal
information retrievalnatural language processing
Y
Yang Liu
Tsinghua University, China