Cascading Bandits With Feedback

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This paper addresses the cascaded inference decision problem in edge intelligence scenarios, where the objective is to dynamically balance model accuracy and error probability across multiple models to minimize cumulative regret. We formulate each “arm” as an inference model characterized by its accuracy and error probability, embedded within a cascaded feedback structure. To this end, we propose an adaptive online learning–based decision framework. Theoretically, we prove that both the Lower Confidence Bound (LCB) and Thompson Sampling strategies achieve *O*(1) constant regret—substantially outperforming static or phased strategies such as Explore-then-Commit and Action Elimination. Our analysis further reveals that adaptive confidence updating is critical for overcoming the limitations of fixed-order execution. Extensive simulations validate the framework’s efficiency and robustness under uncertain edge environments.

Technology Category

Application Category

📝 Abstract

Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.

Problem

Research questions and friction points this paper is trying to address.

Optimizing model selection for edge inference under uncertainty

Analyzing adaptive decision policies to minimize regret in cascading bandits

Evaluating bandit algorithms for accuracy-error tradeoffs in edge computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascade bandit model for edge inference optimization

LCB and Thompson Sampling achieve constant regret

Continuous decision updates based on observed feedback

🔎 Similar Papers

Neural Dueling Bandits