Accelerated Training on Low-Power Edge Devices

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address slow training on power-constrained low-power edge devices, this paper proposes a cross-layer joint optimization framework that co-optimizes GPU frequency and batch size. The method integrates a batch-size efficiency prediction model with dynamic, real-time device power measurement modeling, enabling simultaneous minimization of both training time and energy consumption under strict power budgets—a first in the literature. Through system-level parameter coordination and hardware-in-the-loop runtime optimization, it achieves a 2.4× speedup in training latency and substantial energy reduction on real edge platforms, without compromising model accuracy. The core innovation lies in deeply embedding hardware power characteristics into the training configuration decision loop, thereby transcending conventional static hyperparameter tuning paradigms.

Technology Category

Application Category

📝 Abstract
Training on edge devices poses several challenges as these devices are generally resource-constrained, especially in terms of power. State-of-the-art techniques at the device level reduce the GPU frequency to enforce power constraints, leading to a significant increase in training time. To accelerate training, we propose to jointly adjust the system and application parameters (in our case, the GPU frequency and the batch size of the training task) while adhering to the power constraints on devices. We introduce a novel cross-layer methodology that combines predictions of batch size efficiency and device profiling to achieve the desired optimization. Our evaluation on real hardware shows that our method outperforms the current baselines that depend on state of the art techniques, reducing the training time by $2.4 imes$ with results very close to optimal. Our measurements also indicate a substantial reduction in the overall energy used for the training process. These gains are achieved without reduction in the performance of the trained model.
Problem

Research questions and friction points this paper is trying to address.

Accelerate training on edge devices
Optimize GPU frequency and batch size
Reduce training time and energy usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly adjust system and application parameters
Introduce novel cross-layer methodology
Reduce training time and energy usage
🔎 Similar Papers
No similar papers found.
M
Mohamed Aboelenien Ahmed
Karlsruhe Institute of Technology
K
Kilian Pfeiffer
Karlsruhe Institute of Technology
H
Heba Khdr
Karlsruhe Institute of Technology
Osama Abboud
Osama Abboud
Senior Research Engineer, Huawei Technologies
R
R. Khalili
Huawei Research Center Munich
Jörg Henkel
Jörg Henkel
Professor of Computer Science, Karlsruhe Institute of Technology
Embedded SystemsSystems-on-ChipDependable SystemsLow Power DesignThermal Design