🤖 AI Summary
This paper addresses the cascaded inference decision problem in edge intelligence scenarios, where the objective is to dynamically balance model accuracy and error probability across multiple models to minimize cumulative regret. We formulate each “arm” as an inference model characterized by its accuracy and error probability, embedded within a cascaded feedback structure. To this end, we propose an adaptive online learning–based decision framework. Theoretically, we prove that both the Lower Confidence Bound (LCB) and Thompson Sampling strategies achieve *O*(1) constant regret—substantially outperforming static or phased strategies such as Explore-then-Commit and Action Elimination. Our analysis further reveals that adaptive confidence updating is critical for overcoming the limitations of fixed-order execution. Extensive simulations validate the framework’s efficiency and robustness under uncertain edge environments.
📝 Abstract
Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.