SparseDVFS: Sparse-Aware DVFS for Energy-Efficient Edge Inference

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

This work addresses the challenge of balancing energy efficiency and performance in power-constrained edge devices, where conventional dynamic voltage and frequency scaling (DVFS) struggles due to overly coarse model-level granularity or hardware switching latency at the operator level. The authors propose a fine-grained, sparsity-aware DVFS framework that classifies operators as compute- or memory-intensive based on their sparsity characteristics and assigns tailored CPU/GPU/EMC frequency triplets accordingly. Key innovations include an offline mapping derived from white-box timing analysis linking sparsity patterns to optimal frequencies, a runtime greedy graph partitioning strategy that forms superblocks to balance scheduling granularity against switching overhead, and a unified cooperative controller (FUSE) that resolves multi-controller conflicts and masks latency. Experiments demonstrate a 78.17% average improvement in energy efficiency over the state-of-the-art, with a cost-benefit ratio of 14%.

Technology Category

Application Category

📝 Abstract

Deploying deep neural networks (DNNs) on power-sensitive edge devices presents a formidable challenge. While Dynamic Voltage and Frequency Scaling (DVFS) is widely employed for energy optimization, traditional model-level scaling is often too coarse to capture intra-inference variations, whereas fine-grained operator-level scaling suffers from prohibitive performance degradation due to significant hardware switching latency. This paper presents SparseDVFS, a fine-grained, sparse-aware DVFS framework designed for energy-efficient edge inference. Our key insight is that operator sparsity is a primary metric for hardware frequency modulation. By distinguishing between compute-bound dense operators and memory-bound sparse operators, the system can apply specialized frequency triplets to maximize energy efficiency. To overcome switching overheads and component interference, SparseDVFS incorporates three key innovations: (1) an offline modeler that established a deterministic mapping between operator sparsity and optimal frequency triplets (CPU/GPU/EMC) via white-box timeline analysis; (2) a runtime graph partitioner that utilizes a greedy merging heuristic to aggregate operators into super-blocks, balancing scaling granularity and DVFS switching latency through a latency amortization constraint; and (3) a unified co-governor that employs a frequency unified scaling engine (FUSE) and a look-ahead instruction queue to eliminate antagonistic effects between independent controllers and hide hardware transition latencies. Extensive evaluations show that SparseDVFS achieves an average 78.17% energy efficiency gain over state-of-the-art solutions while maintaining a superior 14% cost-gain ratio.

Problem

Research questions and friction points this paper is trying to address.

DVFS

edge inference

energy efficiency

operator sparsity

hardware switching latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse-aware DVFS

operator sparsity

frequency triplet