Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models

📅 2024-08-26

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing pruning methods for multilingual large language models (LLMs) commonly rely solely on English calibration data, neglecting linguistic diversity and potentially compromising cross-lingual capability retention. Method: We conduct the first systematic empirical study—across languages (English, Chinese, French, Spanish), tasks (downstream understanding + language modeling), and models (mT5, BLOOM, Qwen)—evaluating mainstream pruning techniques (Magnitude, SNIP, GRASP) under calibration in target versus English-only settings. Results: Target-language calibration effectively preserves language-specific modeling capacity but does not consistently improve downstream task performance. Crucially, language-agnostic reasoning and knowledge representations prove highly fragile during pruning and are disproportionately removed. Our findings uncover a critical inconsistency between calibration language choice and functional capability preservation, challenging the assumption of calibration-language neutrality. This work provides both theoretical insight into representation vulnerability in multilingual LLMs and practical guidance for designing language-aware pruning strategies to enable efficient, equitable multilingual model compression.

Technology Category

Application Category

📝 Abstract

Recent advances in large language model (LLM) pruning have shown state-of-the-art (SotA) compression results in post-training and retraining-free settings while maintaining high predictive performance. However, previous research mainly considered calibrating based on English text, despite the multilingual nature of modern LLMs and their frequent use in non-English languages. In this paper, we set out to investigate calibrating the pruning of multilingual language models for monolingual applications. We present the first comprehensive empirical study, comparing different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks. Through further analysis of latent subspaces, pruning masks, and individual neurons within pruned models, we find that while pruning generally preserves strong language-specific features, it may fail to retain language-specific neuron activation patterns and subtle, language-agnostic features associated with knowledge and reasoning that are needed for complex tasks.

Problem

Research questions and friction points this paper is trying to address.

Investigating calibration language impact on multilingual LLM pruning

Analyzing performance changes when pruning models for monolingual applications

Examining limitations in preserving language-agnostic features during pruning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Investigates calibration language impact in multilingual pruning

Analyzes latent subspaces and pruning masks in models

Reveals limitations in preserving language-agnostic features

🔎 Similar Papers

Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind