MI-to-Mid Distilled Compression (M2M-DC): An Hybrid-Information-Guided-Block Pruning with Progressive Inner Slicing Approach to Model Compression

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the accuracy degradation and structural distortion commonly encountered in deep learning model compression. We propose a dual-scale, shape-preserving lightweighting framework that jointly exploits cross-layer and intra-layer redundancy while strictly preserving residual or inverted-residual topologies. Methodologically, it integrates label-aware mutual information–driven residual block ranking and pruning, stage-consistent structured channel slicing, intermediate channel pruning, and multi-stage knowledge distillation. Our key contribution is the first unification of mutual information–based evaluation with progressive structured pruning within a dual-scale compression paradigm, applicable to general residual architectures. Experiments demonstrate state-of-the-art performance: on CIFAR-100, ResNet-18 achieves 72% parameter reduction and 63% FLOPs reduction with accuracy improvement; MobileNetV2 attains a +2.5% accuracy gain using only 27% of its original parameters.

Technology Category

Application Category

📝 Abstract
We introduce MI-to-Mid Distilled Compression (M2M-DC), a two-scale, shape-safe compression framework that interleaves information-guided block pruning with progressive inner slicing and staged knowledge distillation (KD). First, M2M-DC ranks residual (or inverted-residual) blocks by a label-aware mutual information (MI) signal and removes the least informative units (structured prune-after-training). It then alternates short KD phases with stage-coherent, residual-safe channel slicing: (i) stage"planes"(co-slicing conv2 out-channels with the downsample path and next-stage inputs), and (ii) an optional mid-channel trim (conv1 out / bn1 / conv2 in). This targets complementary redundancy, whole computational motifs and within-stage width while preserving residual shape invariants. On CIFAR-100, M2M-DC yields a clean accuracy-compute frontier. For ResNet-18, we obtain 85.46% Top-1 with 3.09M parameters and 0.0139 GMacs (72% params, 63% GMacs vs. teacher; mean final 85.29% over three seeds). For ResNet-34, we reach 85.02% Top-1 with 5.46M params and 0.0195 GMacs (74% / 74% vs. teacher; mean final 84.62%). Extending to inverted-residuals, MobileNetV2 achieves a mean final 68.54% Top-1 at 1.71M params (27%) and 0.0186 conv GMacs (24%), improving over the teacher's 66.03% by +2.5 points across three seeds. Because M2M-DC exposes only a thin, architecture-aware interface (blocks, stages, and down sample/skip wiring), it generalizes across residual CNNs and extends to inverted-residual families with minor legalization rules. The result is a compact, practical recipe for deployment-ready models that match or surpass teacher accuracy at a fraction of the compute.
Problem

Research questions and friction points this paper is trying to address.

Compresses neural networks by pruning blocks using mutual information guidance
Reduces model redundancy through progressive channel slicing and knowledge distillation
Maintains accuracy while significantly decreasing parameters and computational costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid information-guided block pruning for structured compression
Progressive inner slicing of residual blocks and channels
Staged knowledge distillation interleaved with pruning and slicing
🔎 Similar Papers
No similar papers found.
Lionel Levine
Lionel Levine
Professor, Cornell University
probabilitycombinatoricsstatistical mechanicsAI safety
H
Haniyeh Ehsani Oskouie
Department of Computer Science, UCLA
Sajjad Ghiasvand
Sajjad Ghiasvand
PhD Student at UC Santa Barbara
Machine LearningOptimizationPEFT
M
Majid Sarrafzadeh
Department of Computer Science, UCLA