The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Energy consumption during inference of generative AI services has become a critical bottleneck for real-world deployment, yet existing research lacks systematic measurement and optimization methodologies. This paper introduces ML.ENERGY, the first benchmarking framework dedicated to inference energy efficiency for generative AI services. It establishes four core design principles and delivers an open-source toolchain supporting hardware-level power monitoring, service-oriented workload simulation, and standardized evaluation across diverse models and tasks. A novel automated optimization recommendation mechanism is proposed, achieving over 40% average energy reduction without altering model outputs. We empirically evaluate 40 mainstream models across six task categories, release the ML.ENERGY Leaderboard, and uncover significant impacts of architectural choices—including quantization, attention mechanisms, and decoding strategies—on energy consumption. ML.ENERGY provides a reproducible, scalable foundation for evaluating and optimizing green AI systems.

Technology Category

Application Category

📝 Abstract
As the adoption of Generative AI in real-world services grow explosively, energy has emerged as a critical bottleneck resource. However, energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems. We present the ML.ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environments, and the corresponding ML.ENERGY Leaderboard, which have served as a valuable resource for those hoping to understand and optimize the energy consumption of their generative AI services. In this paper, we explain four key design principles for benchmarking ML energy we have acquired over time, and then describe how they are implemented in the ML.ENERGY Benchmark. We then highlight results from the latest iteration of the benchmark, including energy measurements of 40 widely used model architectures across 6 different tasks, case studies of how ML design choices impact energy consumption, and how automated optimization recommendations can lead to significant (sometimes more than 40%) energy savings without changing what is being computed by the model. The ML.ENERGY Benchmark is open-source and can be easily extended to various customized models and application scenarios.
Problem

Research questions and friction points this paper is trying to address.

Measuring inference energy consumption in ML systems
Optimizing energy use for generative AI services
Automating energy-saving recommendations for ML models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark suite for ML inference energy measurement
Automated optimization recommendations for energy savings
Open-source tool for diverse models and scenarios
🔎 Similar Papers
No similar papers found.