IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization

๐Ÿ“… 2025-11-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the coexistence of filter-level and architectural redundancy in neural network compression, this paper proposes a unified pruning framework based on information flow divergence. Methodologically, it unifies fine-grained filter-level pruning and coarse-grained layer-level removal within a single theoretical framework: tensor flow divergence quantifies the contribution of individual filters and entire layers to information propagation; a two-stage iterative optimization is then employedโ€”first performing divergence-aware filter pruning, followed by elimination of inefficient modules based on layer-wise contribution analysis. The framework is architecture-agnostic, applicable to CNNs, Transformers, and other modern architectures. Experiments on mainstream benchmarks demonstrate parameter compression ratios comparable to or exceeding state-of-the-art methods, with negligible accuracy degradation and significantly improved deployment efficiency under resource constraints. The core contribution lies in establishing a cross-granularity, theoretically consistent, and interpretable joint compression paradigm.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical information pathways. The second stage extends this principle to higher-level architecture optimization by analyzing layer-wise contributions to information propagation and selectively eliminating entire layers that demonstrate minimal impact on network performance. The proposed method naturally adapts to diverse architectures, including convolutional networks, transformers, and hybrid designs, providing a consistent metric for comparing the structural importance across different layer types. Experimental validation across multiple modern architectures and datasets reveals that this combined approach achieves substantial model compression while maintaining competitive accuracy. The presented approach achieves parameter reduction results that are globally comparable to those of state-of-the-art solutions and outperforms them across a wide range of modern neural network architectures, from convolutional models to transformers. The results demonstrate how flow divergence serves as an effective guiding principle for both filter-level and layer-level optimization, offering practical benefits for deployment in resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Optimizes neural network compression via filter and layer pruning
Uses information flow divergence to guide redundancy removal
Maintains accuracy while reducing parameters across diverse architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Filter-level pruning via divergence-aware optimization
Layer-level optimization by analyzing information propagation
Unified framework adapting to diverse neural architectures
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Aleksei Samarin
Wayy LLC, Miami, FL 33132, USA
Artem Nazarenko
Artem Nazarenko
Wayy LLC, Miami, FL 33132, USA
E
Egor Kotenko
Wayy LLC, Miami, FL 33132, USA
Valentin Malykh
Valentin Malykh
MTS AI / ITMO University
Artificial IntelligenceNatural Language UnderstandingNatural Language ProcessingDialog Systems
A
Alexander Savelev
Wayy LLC, Miami, FL 33132, USA
A
Aleksei Toropov
Wayy LLC, Miami, FL 33132, USA