IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

To address the coexistence of filter-level and architectural redundancy in neural network compression, this paper proposes a unified pruning framework based on information flow divergence. Methodologically, it unifies fine-grained filter-level pruning and coarse-grained layer-level removal within a single theoretical framework: tensor flow divergence quantifies the contribution of individual filters and entire layers to information propagation; a two-stage iterative optimization is then employed—first performing divergence-aware filter pruning, followed by elimination of inefficient modules based on layer-wise contribution analysis. The framework is architecture-agnostic, applicable to CNNs, Transformers, and other modern architectures. Experiments on mainstream benchmarks demonstrate parameter compression ratios comparable to or exceeding state-of-the-art methods, with negligible accuracy degradation and significantly improved deployment efficiency under resource constraints. The core contribution lies in establishing a cross-granularity, theoretically consistent, and interpretable joint compression paradigm.

Technology Category

Application Category

📝 Abstract

This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical information pathways. The second stage extends this principle to higher-level architecture optimization by analyzing layer-wise contributions to information propagation and selectively eliminating entire layers that demonstrate minimal impact on network performance. The proposed method naturally adapts to diverse architectures, including convolutional networks, transformers, and hybrid designs, providing a consistent metric for comparing the structural importance across different layer types. Experimental validation across multiple modern architectures and datasets reveals that this combined approach achieves substantial model compression while maintaining competitive accuracy. The presented approach achieves parameter reduction results that are globally comparable to those of state-of-the-art solutions and outperforms them across a wide range of modern neural network architectures, from convolutional models to transformers. The results demonstrate how flow divergence serves as an effective guiding principle for both filter-level and layer-level optimization, offering practical benefits for deployment in resource-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Optimizes neural network compression via filter and layer pruning

Uses information flow divergence to guide redundancy removal

Maintains accuracy while reducing parameters across diverse architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Filter-level pruning via divergence-aware optimization

Layer-level optimization by analyzing information propagation

Unified framework adapting to diverse neural architectures

🔎 Similar Papers

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization