Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the high training cost and performance degradation of Small Language Models (SLMs) deployed on edge devices, this paper proposes a layer-adaptive incremental pruning framework. Our method introduces a novel pruning–fine-tuning alternation mechanism: in each iteration, only ~5% of structured neurons are pruned, with layer-wise sparsity dynamically adjusted. It integrates structured pruning, progressive parameter reduction, and knowledge distillation to efficiently compress large models such as LLaMA-3.1-8B. On commonsense reasoning benchmarks, our approach achieves average accuracy gains of 1–7%. The compressed MobileLLM-125M matches the performance of the original 600M model while requiring only 1/200 of the training tokens. Furthermore, our newly constructed 1B-scale SLM outperforms LLaMA-3.2-1B across all evaluated dimensions. The framework balances accuracy, training efficiency, and deployment feasibility, establishing a new paradigm for lightweight SLM training.

Technology Category

Application Category

📝 Abstract

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train the models from scratch, which incurs substantial computational costs, or compress/prune existing large language models (LLMs), which results in performance drops and falls short in comparison to pre-training. In this paper, we investigate the family of acceleration methods that involve both structured pruning and model training. We found 1) layer-wise adaptive pruning (Adapt-Pruner) is extremely effective in LLMs and yields significant improvements over existing pruning techniques, 2) adaptive pruning equipped with further training leads to models comparable to those pre-training from scratch, 3) incremental pruning brings non-trivial performance gain by interleaving pruning with training and only removing a small portion of neurons ($sim$5%) at a time. Experimental results on LLaMA-3.1-8B demonstrate that Adapt-Pruner outperforms conventional pruning methods, such as LLM-Pruner, FLAP, and SliceGPT, by an average of 1%-7% in accuracy on commonsense benchmarks. Additionally, Adapt-Pruner restores the performance of MobileLLM-125M to 600M on the MMLU benchmark with 200$ imes$ fewer tokens via pruning from its larger counterparts, and discovers a new 1B model that surpasses LLaMA-3.2-1B in multiple benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Enhances small language model efficiency

Reduces computational costs in training

Improves performance over traditional pruning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive pruning enhances model efficiency

Incremental pruning boosts performance significantly

Pruning with training matches pre-training results

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Senior Applied Research Scientist

AMD

Bellevue, WA, USA / San Jose, CA, USA

Authors to Follow