Reducing Compute Waste in LLMs through Kernel-Level DVFS

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant energy inefficiency in large language model (LLM) training and inference, where substantial power is wasted without proportional performance gains. To improve energy efficiency while preserving model performance, the authors propose a fine-grained, kernel-level dynamic voltage and frequency scaling (DVFS) approach. Unlike prior methods that apply DVFS at the layer or iteration level, this technique tailors voltage and frequency settings dynamically to individual computational kernels, leveraging GPU performance modeling and parallelism analysis to identify optimal configurations. Evaluated on GPT-3 training tasks, the method achieves up to 14.6% energy savings with only a 0.6% performance overhead. Furthermore, it demonstrates strong generalization across diverse data and tensor parallelism setups, markedly enhancing overall energy efficiency.

Technology Category

Application Category

📝 Abstract
The rapid growth of AI has fueled the expansion of accelerator- or GPU-based data centers. However, the rising operational energy consumption has emerged as a critical bottleneck and a major sustainability concern. Dynamic Voltage and Frequency Scaling (DVFS) is a well-known technique used to reduce energy consumption, and thus improve energy-efficiency, since it requires little effort and works with existing hardware. Reducing the energy consumption of training and inference of Large Language Models (LLMs) through DVFS or power capping is feasible: related work has shown energy savings can be significant, but at the cost of significant slowdowns. In this work, we focus on reducing waste in LLM operations: i.e., reducing energy consumption without losing performance. We propose a fine-grained, kernel-level, DVFS approach that explores new frequency configurations, and prove these save more energy than previous, pass- or iteration-level solutions. For example, for a GPT-3 training run, a pass-level approach could reduce energy consumption by 2% (without losing performance), while our kernel-level approach saves as much as 14.6% (with a 0.6% slowdown). We further investigate the effect of data and tensor parallelism, and show our discovered clock frequencies translate well for both. We conclude that kernel-level DVFS is a suitable technique to reduce waste in LLM operations, providing significant energy savings with negligible slow-down.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Energy Efficiency
Compute Waste
Dynamic Voltage and Frequency Scaling
LLM Operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

kernel-level DVFS
energy efficiency
large language models
compute waste reduction
fine-grained frequency scaling
🔎 Similar Papers
No similar papers found.