M-Wanda: Improving One-Shot Pruning for Multilingual LLMs

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the sharp cross-lingual performance degradation in one-shot pruning of multilingual large language models—caused by neglecting linguistic disparities—this paper proposes a language-aware dynamic pruning method. The core innovation lies in (1) explicitly modeling multilingual capability preservation as an optimization objective, (2) characterizing cross-lingual differences via language-grouped activation statistics to dynamically allocate layer-wise sparsity, and (3) adapting the Wanda pruning criterion for multilingual settings. Evaluated on multiple multilingual benchmarks, the method significantly mitigates performance collapse at moderate sparsity levels (30%–50%), yielding average gains of 2.1–4.7 BLEU/accuracy points with negligible computational overhead. Results demonstrate that language-aware sparsity scheduling is critical for balancing multilingual performance retention and model compression efficiency.

Technology Category

Application Category

📝 Abstract

Multilingual LLM performance is often critically dependent on model size. With an eye on efficiency, this has led to a surge in interest in one-shot pruning methods that retain the benefits of large-scale pretraining while shrinking the model size. However, as pruning tends to come with performance loss, it is important to understand the trade-offs between multilinguality and sparsification. In this work, we study multilingual performance under different sparsity constraints and show that moderate ratios already substantially harm performance. To help bridge this gap, we propose M-Wanda, a pruning method that models cross-lingual variation by incorporating language-aware activation statistics into its pruning criterion and dynamically adjusts layerwise sparsity based on cross-lingual importance. We show that M-Wanda consistently improves performance at minimal additional costs. We are the first to explicitly optimize pruning to retain multilingual performance, and hope to inspire future advances in multilingual pruning.

Problem

Research questions and friction points this paper is trying to address.

Improving multilingual LLM pruning with minimal performance loss

Balancing multilinguality and sparsification in one-shot pruning methods

Optimizing pruning to retain cross-lingual performance efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates language-aware activation statistics

Dynamically adjusts layerwise sparsity

Optimizes pruning for multilingual performance

🔎 Similar Papers

No similar papers found.