Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind

📅 2024-04-06

🏛️ International Conference on Language Resources and Evaluation

📈 Citations: 4

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing multilingual large language model (LLM) compression methods rely on English-centric calibration sets, leading to substantial performance degradation—particularly for low-resource languages. Method: This paper proposes a novel multilingual calibration data sampling strategy grounded in the language distribution of the original training corpus, enabling calibration set construction that faithfully mirrors the proportional representation of languages in the pretraining data—the first such approach. It further integrates model pruning, quantization, and BLOOM-specific architectural adaptations, validated via cross-lingual performance attribution analysis. Contribution/Results: Experiments on BLOOM demonstrate that our method significantly narrows the cross-lingual performance gap: BLEU scores for multiple low-resource languages improve by over 15%. The work uncovers a synergistic interaction between linguistic similarity and training-data language proportion in preserving post-compression multilingual performance. Overall, it establishes a practical, non-English-centric paradigm for efficient multilingual LLM compression.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques. Keywords: Large Language Model, Multilingual Model Compression

Problem

Research questions and friction points this paper is trying to address.

Compressing multilingual LLMs without degrading low-resource language performance

Overcoming English-centric bias in calibration data for model compression

Improving language inclusivity in compression techniques for multilingual models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proportional multilingual calibration data sampling

Improves low-resource language compression accuracy

Balances language performance via training distribution

🔎 Similar Papers

No similar papers found.