Towards Universal & Efficient Model Compression via Exponential Torque Pruning

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

To address the substantial computational and memory overheads caused by oversized modern deep neural networks, this paper proposes an efficient structured pruning method based on exponential force-field regularization. Unlike conventional linear torque regularization, we introduce, for the first time, an exponentially decaying force function to impose weight constraints—redundant modules farther from pivot points experience stronger compressive forces, thereby significantly enhancing pruning selectivity and automation. The method jointly optimizes weights and architecture during training, achieving up to 10.3× model compression across diverse vision and language tasks while incurring less than 0.3% accuracy degradation—outperforming current state-of-the-art approaches. Key contributions include: (1) a novel exponential force-field regularization mechanism that dynamically modulates structural sparsity; and (2) an end-to-end trainable structured pruning framework delivering high fidelity, high compression ratios, and seamless integration into standard training pipelines.

Technology Category

Application Category

📝 Abstract

The rapid growth in complexity and size of modern deep neural networks (DNNs) has increased challenges related to computational costs and memory usage, spurring a growing interest in efficient model compression techniques. Previous state-of-the-art approach proposes using a Torque-inspired regularization which forces the weights of neural modules around a selected pivot point. Whereas, we observe that the pruning effect of this approach is far from perfect, as the post-trained network is still dense and also suffers from high accuracy drop. In this work, we attribute such ineffectiveness to the default linear force application scheme, which imposes inappropriate force on neural module of different distances. To efficiently prune the redundant and distant modules while retaining those that are close and necessary for effective inference, in this work, we propose Exponential Torque Pruning (ETP), which adopts an exponential force application scheme for regularization. Experimental results on a broad range of domains demonstrate that, though being extremely simple, ETP manages to achieve significantly higher compression rate than the previous state-of-the-art pruning strategies with negligible accuracy drop.

Problem

Research questions and friction points this paper is trying to address.

Addresses inefficient pruning in dense neural networks

Reduces high accuracy drop in model compression

Improves compression rates with exponential force scheme

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exponential force application for regularization

Higher compression rate with negligible accuracy drop

Efficient pruning of redundant distant modules

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection