Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind

📅 2024-04-06
🏛️ International Conference on Language Resources and Evaluation
📈 Citations: 4
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
Existing multilingual large language model (LLM) compression methods rely on English-centric calibration sets, leading to substantial performance degradation—particularly for low-resource languages. Method: This paper proposes a novel multilingual calibration data sampling strategy grounded in the language distribution of the original training corpus, enabling calibration set construction that faithfully mirrors the proportional representation of languages in the pretraining data—the first such approach. It further integrates model pruning, quantization, and BLOOM-specific architectural adaptations, validated via cross-lingual performance attribution analysis. Contribution/Results: Experiments on BLOOM demonstrate that our method significantly narrows the cross-lingual performance gap: BLEU scores for multiple low-resource languages improve by over 15%. The work uncovers a synergistic interaction between linguistic similarity and training-data language proportion in preserving post-compression multilingual performance. Overall, it establishes a practical, non-English-centric paradigm for efficient multilingual LLM compression.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques. Keywords: Large Language Model, Multilingual Model Compression
Problem

Research questions and friction points this paper is trying to address.

Compressing multilingual LLMs without degrading low-resource language performance
Overcoming English-centric bias in calibration data for model compression
Improving language inclusivity in compression techniques for multilingual models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proportional multilingual calibration data sampling
Improves low-resource language compression accuracy
Balances language performance via training distribution
🔎 Similar Papers
No similar papers found.
H
Hongchuan Zeng
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China
Hongshen Xu
Hongshen Xu
Shanghai Jiao Tong University
Natural Language ProcessingLarge Language ModelLLM Alignment
L
Lu Chen
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Suzhou Laboratory, Suzhou, China
K
Kai Yu
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Suzhou Laboratory, Suzhou, China