ReGate: Enabling Power Gating in Neural Processing Units

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern NPUs suffer from high static power consumption—accounting for 30–72% of total power—which severely limits datacenter energy efficiency and sustainability. To address this, we propose the first NPU-customized hybrid fine-grained power gating scheme. Our approach dynamically coordinates hardware-level cycle control with software-level compiler-aware scheduling, leveraging lightweight idle-detection circuits, an extended instruction set, and low-overhead power management mechanisms—tailored to computational unit characteristics. Evaluated on a production-grade NPU simulator, it achieves an average energy reduction of 15.5% (up to 32.8%), with hardware overhead ≤3.3% and negligible performance degradation. The core innovation lies in a heterogeneous, software-hardware co-designed power gating paradigm that jointly optimizes energy efficiency, silicon area, and real-time responsiveness—establishing a novel pathway for low-power AI accelerator design.

Technology Category

Application Category

📝 Abstract
The energy efficiency of neural processing units (NPU) is playing a critical role in developing sustainable data centers. Our study with different generations of NPU chips reveals that 30%-72% of their energy consumption is contributed by static power dissipation, due to the lack of power management support in modern NPU chips. In this paper, we present ReGate, which enables fine-grained power-gating of each hardware component in NPU chips with hardware/software co-design. Unlike conventional power-gating techniques for generic processors, enabling power-gating in NPUs faces unique challenges due to the fundamental difference in hardware architecture and program execution model. To address these challenges, we carefully investigate the power-gating opportunities in each component of NPU chips and decide the best-fit power management scheme (i.e., hardware- vs. software-managed power gating). Specifically, for systolic arrays (SAs) that have deterministic execution patterns, ReGate enables cycle-level power gating at the granularity of processing elements (PEs) following the inherent dataflow execution in SAs. For inter-chip interconnect (ICI) and HBM controllers that have long idle intervals, ReGate employs a lightweight hardware-based idle-detection mechanism. For vector units and SRAM whose idle periods vary significantly depending on workload patterns, ReGate extends the NPU ISA and allows software like compilers to manage the power gating. With implementation on a production-level NPU simulator, we show that ReGate can reduce the energy consumption of NPU chips by up to 32.8% (15.5% on average), with negligible impact on AI workload performance. The hardware implementation of power-gating logic introduces less than 3.3% overhead in NPU chips.
Problem

Research questions and friction points this paper is trying to address.

Reducing static power dissipation in NPU chips
Enabling fine-grained power-gating in NPU components
Addressing unique power management challenges in NPUs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware/software co-design for NPU power gating
Cycle-level power gating for systolic arrays
Lightweight idle-detection for inter-chip interconnect
🔎 Similar Papers
No similar papers found.
Yuqi Xue
Yuqi Xue
University of Illinois Urbana-Champaign
Computer Architecture
J
Jian Huang
University of Illinois Urbana-Champaign