Hierarchical adaptive control for real-time dynamic inference at the edge

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenge of achieving an optimal trade-off among latency, energy consumption, and accuracy in dynamic machine learning on edge devices, where both data distribution shifts and resource fluctuations are prevalent. To this end, the authors propose a two-tier adaptive architecture: a global scheduler deploys a lightweight cascade of expert and general-purpose models that adheres to system constraints, while a local controller continuously monitors data drift and hardware conditions to dynamically activate or deactivate expert models for improved inference efficiency. The key contributions include a formalized budget-constrained cascaded model formulation and a hierarchical control mechanism, both validated on embedded platforms. Experimental results demonstrate that, under distribution shifts, the approach reduces per-inference latency by up to 2.45× and energy consumption by up to 2.86× compared to static baselines, with less than 4% accuracy degradation.

📝 Abstract

Industrial systems increasingly depend on Machine Learning (ML), and operate on heterogeneous nodes that must satisfy tight latency, energy, and memory constraints. Dynamic ML models, which reconfigure their computational footprint at runtime, promise high energy efficiency and lower average latency for modest accuracy tradeoffs; however, their deployment is complex due to the additional hyperparameters they rely on. These hyperparameters, controlling the accuracy versus average latency tradeoff, are often tuned on a calibration dataset that must match the test time distribution, an assumption that rarely holds in real-world scenarios, leading to suboptimal operational conditions, possibly below static models. We propose a two-tier adaptive architecture that co-optimizes model and system decisions. At the global level, a scheduler configures and deploys, for each edge node, a cascade of classifiers composed of lightweight specialized models and a generalist fallback, satisfying latency and memory constraints. At the node level, a local controller tracks data drifts and hardware resources, enabling or disabling specialized predictors (SP) to preserve high energy efficiency and avoid latency-constraint violations under varying conditions. This design allows longer operating times without forcing a global redeployment step, and enables efficient execution in case of an unreachable remote global controller. We evaluate the approach on two datasets under controlled distribution mismatch scenarios, showing average per-inference reductions of latency up to 2.45x and energy up to 2.86x, with less than 4% accuracy drop compared to static baselines. Our contributions are:(1) a budgeted SP-cascade formulation that preserves worst-case latency constraints;(2) a hierarchical controller that maintains efficiency under data and resource changes; and (3) an experimental evaluation on embedded hardware.

Problem

Research questions and friction points this paper is trying to address.

dynamic inference

edge computing

data drift

latency constraints

energy efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical adaptive control

dynamic inference

specialized predictor cascade