🤖 AI Summary
This work addresses the challenge of balancing energy efficiency and performance in power-constrained edge devices, where conventional dynamic voltage and frequency scaling (DVFS) struggles due to overly coarse model-level granularity or hardware switching latency at the operator level. The authors propose a fine-grained, sparsity-aware DVFS framework that classifies operators as compute- or memory-intensive based on their sparsity characteristics and assigns tailored CPU/GPU/EMC frequency triplets accordingly. Key innovations include an offline mapping derived from white-box timing analysis linking sparsity patterns to optimal frequencies, a runtime greedy graph partitioning strategy that forms superblocks to balance scheduling granularity against switching overhead, and a unified cooperative controller (FUSE) that resolves multi-controller conflicts and masks latency. Experiments demonstrate a 78.17% average improvement in energy efficiency over the state-of-the-art, with a cost-benefit ratio of 14%.
📝 Abstract
Deploying deep neural networks (DNNs) on power-sensitive edge devices presents a formidable challenge. While Dynamic Voltage and Frequency Scaling (DVFS) is widely employed for energy optimization, traditional model-level scaling is often too coarse to capture intra-inference variations, whereas fine-grained operator-level scaling suffers from prohibitive performance degradation due to significant hardware switching latency. This paper presents SparseDVFS, a fine-grained, sparse-aware DVFS framework designed for energy-efficient edge inference. Our key insight is that operator sparsity is a primary metric for hardware frequency modulation. By distinguishing between compute-bound dense operators and memory-bound sparse operators, the system can apply specialized frequency triplets to maximize energy efficiency. To overcome switching overheads and component interference, SparseDVFS incorporates three key innovations: (1) an offline modeler that established a deterministic mapping between operator sparsity and optimal frequency triplets (CPU/GPU/EMC) via white-box timeline analysis; (2) a runtime graph partitioner that utilizes a greedy merging heuristic to aggregate operators into super-blocks, balancing scaling granularity and DVFS switching latency through a latency amortization constraint; and (3) a unified co-governor that employs a frequency unified scaling engine (FUSE) and a look-ahead instruction queue to eliminate antagonistic effects between independent controllers and hide hardware transition latencies. Extensive evaluations show that SparseDVFS achieves an average 78.17% energy efficiency gain over state-of-the-art solutions while maintaining a superior 14% cost-gain ratio.