Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing energy-aware neural architecture search (NAS) benchmarks suffer from inaccurate GPU power consumption estimation, particularly under low-load conditions. Method: We propose three core design principles—reliable power measurement, broad GPU load coverage, and full-system energy reporting—and systematically expose substantial underestimation (up to 10.3%) by APIs such as nvidia-smi at low utilization. We introduce a calibration framework integrating external power metering, low-level library analysis, and multi-GPU load control to correct modeling biases in tools like Code Carbon. Contribution/Results: Calibration reduces energy estimation error to 6.6%. We further reveal that in four-GPU parallel setups, actual GPU power spans only 146–305 W, confirming inherent API biases cause systematic underestimation in low-power regimes. Our benchmark enables reproducible, high-fidelity energy-efficiency evaluation for accuracy–energy co-optimization in NAS.

Technology Category

Application Category

📝 Abstract
Neural Architecture Search (NAS) accelerates progress in deep learning through systematic refinement of model architectures. The downside is increasingly large energy consumption during the search process. Surrogate-based benchmarking mitigates the cost of full training by querying a pre-trained surrogate to obtain an estimate for the quality of the model. Specifically, energy-aware benchmarking aims to make it possible for NAS to favourably trade off model energy consumption against accuracy. Towards this end, we propose three design principles for such energy-aware benchmarks: (i) reliable power measurements, (ii) a wide range of GPU usage, and (iii) holistic cost reporting. We analyse EA-HAS-Bench based on these principles and find that the choice of GPU measurement API has a large impact on the quality of results. Using the Nvidia System Management Interface (SMI) on top of its underlying library influences the sampling rate during the initial data collection, returning faulty low-power estimations. This results in poor correlation with accurate measurements obtained from an external power meter. With this study, we bring to attention several key considerations when performing energy-aware surrogate-based benchmarking and derive first guidelines that can help design novel benchmarks. We show a narrow usage range of the four GPUs attached to our device, ranging from 146 W to 305 W in a single-GPU setting, and narrowing down even further when using all four GPUs. To improve holistic energy reporting, we propose calibration experiments over assumptions made in popular tools, such as Code Carbon, thus achieving reductions in the maximum inaccuracy from 10.3 % to 8.9 % without and to 6.6 % with prior estimation of the expected load on the device.
Problem

Research questions and friction points this paper is trying to address.

Assessing energy consumption in Neural Architecture Search benchmarks
Improving accuracy of GPU power measurement methods
Proposing holistic energy reporting for NAS benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Surrogate-based benchmarking reduces full training costs
Energy-aware benchmarks use reliable power measurements
Calibration experiments improve holistic energy reporting
🔎 Similar Papers
No similar papers found.