Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Existing energy-efficiency optimization methods for neural network accelerators suffer from limited hardware-level effectiveness due to reliance on global activation models, coarse-grained energy proxies, or layer-agnostic compression strategies. To address this, we propose an energy-aware per-layer weight compression framework. Our approach introduces a layer-aware MAC energy model that jointly incorporates activation statistics and MSB-Hamming-distance-driven partial-sum transition modeling, enabling fine-grained, tile-level energy estimation under systolic mapping. We further design a joint energy-accuracy optimization algorithm for weight selection and an energy-prioritized hierarchical compression strategy, targeting high-energy layers while satisfying global accuracy constraints. Experimental results on mainstream CNNs demonstrate an average 58.6% energy reduction with only 2–3% accuracy degradation—substantially outperforming state-of-the-art power-aware methods.

Technology Category

Application Category

📝 Abstract

Systolic array accelerators execute CNNs with energy dominated by the switching activity of multiply accumulate (MAC) units. Although prior work exploits weight dependent MAC power for compression, existing methods often use global activation models, coarse energy proxies, or layer-agnostic policies, which limits their effectiveness on real hardware. We propose an energy aware, layer-wise compression framework that explicitly leverages MAC and layer level energy characteristics. First, we build a layer-aware MAC energy model that combines per-layer activation statistics with an MSB-Hamming distance grouping of 22-bit partial sum transitions, and integrate it with a tile-level systolic mapping to estimate convolution-layer energy. On top of this model, we introduce an energy accuracy co-optimized weight selection algorithm within quantization aware training and an energy-prioritized layer-wise schedule that compresses high energy layers more aggressively under a global accuracy constraint. Experiments on different CNN models demonstrate up to 58.6% energy reduction with 2-3% accuracy drop, outperforming a state-of-the-art power-aware baseline.

Problem

Research questions and friction points this paper is trying to address.

Optimizing neural network energy consumption through layer-specific weight selection

Addressing limitations of global activation models and coarse energy proxies

Reducing MAC unit switching activity while maintaining model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-aware MAC energy model with activation statistics

Energy-accuracy co-optimized weight selection algorithm

Energy-prioritized layer-wise compression scheduling

🔎 Similar Papers

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators