🤖 AI Summary
This work addresses the challenges of energy consumption, latency, and memory constraints in object detection on heterogeneous edge devices by proposing an energy-adaptive neural architecture search framework. The approach introduces an energy-aware XiResOFA search space, coupled with a two-stage energy estimator and an iterative search strategy, to efficiently generate a base architecture. Through compound scaling, it yields the XiYOLO family of models tailored to diverse deployment budgets. Requiring only a few samples from the target device, the method enables an interpretable trade-off between accuracy and energy consumption, overcoming limitations of existing approaches that rely heavily on specific scenarios or empirical energy measurements. Experiments demonstrate that XiYOLO significantly outperforms YOLO baselines: on PascalVOC, the medium-sized model achieves 86.15% mAP50 with 20.6% and 35.9% lower energy consumption on GPU and NPU, respectively; on COCO, the small model reduces energy usage by up to 53.7% (GPU) and 51.6% (NPU).
📝 Abstract
Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints while still providing reliable perception for downstream autonomy. Existing energy-aware NAS methods often target limited deployment settings, while real energy remains difficult to optimize because it is highly device-dependent and costly to measure. We address these challenges with an energy-adaptive framework that combines an energy-aware XiResOFA search space, a two-stage energy estimator, and iterative search to identify a single energy-efficient base architecture. We then apply compound scaling to transform this base design into the XiYOLO family across deployment budgets, enabling interpretable accuracy-energy tradeoffs under sparse hardware measurements. Experiments on PascalVOC, COCO, and real-device deployment show that XiYOLO achieves a stronger energy-accuracy tradeoff than YOLO baselines. On PascalVOC, the medium XiYOLO model reaches 86.15 mAP50 while reducing energy relative to YOLOv12m by 20.6% on GPU and 35.9% on NPU. On COCO, XiYOLO reduces energy relative to YOLOv12 by up to 53.7% on GPU and 51.6% on NPU at the small scale. The proposed two-stage estimator also improves sample efficiency over a joint predictor under few-shot adaptation with only 2-20 target-device samples.