Exploring the Limits of Pruning: Task-Specific Neurons, Model Collapse, and Recovery in Task-Specific Large Language Models

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study investigates whether neurons in task-specific large language models contribute uniformly to target tasks and proposes an efficient pruning method to reduce computational overhead. Through systematic neuron ablation experiments, the authors employ activation-based selective metrics to identify low-contribution neurons and compare this approach against random pruning. The work provides the first empirical evidence that critical task information is concentrated in a small subset of neurons: removing approximately 10% of these key neurons leads to catastrophic performance degradation, whereas selective pruning can eliminate 30%–35% of parameters while preserving performance. Further fine-tuning effectively recovers task accuracy, substantially reducing model size, GPU memory consumption, and significantly improving inference throughput.

📝 Abstract

Neuron pruning is widely used to reduce the computational cost and parameter footprint of large language models, yet it remains unclear whether neurons in task-specific models contribute uniformly to task performance. In this work, we provide empirical evidence for the existence and importance of task-specific neurons through a systematic pruning study on language models specialized for mathematical reasoning and code generation. We introduce an activation-based selectivity metric to identify neurons with low contribution to the target task and prune them while preserving target-task accuracy, and compare selective pruning with random pruning. Selective pruning consistently outperforms random pruning, indicating that activation-based selectivity provides a systematic advantage over random pruning. Reverse pruning experiments further show that removing a small subset of highly task-specific neurons (~10%) causes complete performance collapse, suggesting that there exist task specific neurons and critical task information is concentrated in a small portion of the network. In contrast, selective pruning of less critical neurons (~30% - ~35%) reduces accuracy but still preserves significant performance. We also observed consistent reductions in parameters and runtime VRAM usage, along with improved inference throughput as pruning increases. Experiments on both 1.5B and 7B models reveal a robustness threshold around 15-20% pruning, beyond which accuracy loss and generation failures increase sharply. Fine-tuning substantially recovers performance across pruning levels, particularly for aggressively pruned models. These findings provide empirical evidence of neuron specialization in task-specific language models and offer insights into pruning robustness, model redundancy, and post-pruning recoverability.

Problem

Research questions and friction points this paper is trying to address.

task-specific neurons

model pruning

model collapse

large language models

neuron specialization

Innovation

Methods, ideas, or system contributions that make the work stand out.

task-specific neurons

activation-based selectivity

neuron pruning