Neural expressiveness for beyond importance model compression

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Conventional pruning methods over-rely on weight “importance” while neglecting neurons’ capacity to redistribute information via activation overlap—termed “expressivity.” Method: This paper introduces expressivity as a novel model compression criterion, quantifying neuronal activation overlap to assess information representation capability and redundancy mitigation. The proposed data- and initialization-agnostic pruning strategy fuses expressivity with conventional importance metrics, enabling lightweight hybrid pruning with single-sample or zero-shot approximations. Contribution/Results: Evaluated on YOLOv8, the method achieves 55.4% parameter reduction, 46.1% MACs decrease, and a 3.0% mAP<sub>50–95</sub> gain. Compared to pure weight-based pruning, it attains up to 10× higher compression ratio with only ~1% accuracy degradation. This work establishes expressivity as the first generalizable, state-robust paradigm for neural network compression.

Technology Category

Application Category

📝 Abstract

Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this paper, we introduce a novel criterion for model compression, named "Expressiveness". Unlike existing pruning methods that rely on the inherent "Importance" of neurons' and filters' weights, ``Expressiveness" emphasizes a neuron's or group of neurons ability to redistribute informational resources effectively, based on the overlap of activations. This characteristic is strongly correlated to a network's initialization state, establishing criterion autonomy from the learning state stateless and thus setting a new fundamental basis for the expansion of compression strategies in regards to the "When to Prune" question. We show that expressiveness is effectively approximated with arbitrary data or limited dataset's representative samples, making ground for the exploration of Data-Agnostic strategies. Our work also facilitates a "hybrid" formulation of expressiveness and importance-based pruning strategies, illustrating their complementary benefits and delivering up to 10x extra gains w.r.t. weight-based approaches in parameter compression ratios, with an average of 1% in performance degradation. We also show that employing expressiveness (independently) for pruning leads to an improvement over top-performing and foundational methods in terms of compression efficiency. Finally, on YOLOv8, we achieve a 46.1% MACs reduction by removing 55.4% of the parameters, with an increase of 3% in the mean Absolute Precision ($mAP_{50-95}$) for object detection on COCO dataset.

Problem

Research questions and friction points this paper is trying to address.

Introduces a novel compression criterion called 'Expressiveness' based on activation overlap.

Enables data-agnostic pruning strategies independent of the network's learning state.

Combines expressiveness with importance-based pruning for improved parameter compression ratios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expressiveness criterion for neuron pruning based on activation overlap

Data-agnostic strategy using arbitrary or limited representative samples

Hybrid approach combining expressiveness and importance for enhanced compression

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

2024-09-05arXiv.orgCitations: 0

MCNC: Manifold-Constrained Reparameterization for Neural Compression

2024-06-27Citations: 1

Nvidia

192,000 USD - 304,750 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5

US, CA, Santa Clara / US, WA, Seattle

Research Scientist, AI & Systems Co-design (PhD)