Does Feedback Help in Bandits with Arm Erasures?

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the distributed multi-armed bandit problem over erasure channels with communication constraints: a learner sends arm-selection commands to an agent via an erasure channel with erasure probability ε; upon erasure, the agent repeats the last successfully received arm pull, and the learner always observes the reward of the actually pulled arm. A central question is whether *erasure feedback*—i.e., agent-to-learner acknowledgment of successful command reception—improves the worst-case regret bound. We establish, for the first time, that erasure feedback does not alter the asymptotic order of the regret lower bound, which is tightly characterized as Ω(√(KT) + K/(1−ε)); it only affects constant factors. Building on this insight, we propose a feedback-aware algorithm that preserves the optimal asymptotic regret order while significantly reducing the leading constants. Both theoretical analysis and numerical experiments confirm its improved convergence behavior.

Technology Category

Application Category

📝 Abstract
We study a distributed multi-armed bandit (MAB) problem over arm erasure channels, motivated by the increasing adoption of MAB algorithms over communication-constrained networks. In this setup, the learner communicates the chosen arm to play to an agent over an erasure channel with probability $epsilon in [0,1)$; if an erasure occurs, the agent continues pulling the last successfully received arm; the learner always observes the reward of the arm pulled. In past work, we considered the case where the agent cannot convey feedback to the learner, and thus the learner does not know whether the arm played is the requested or the last successfully received one. In this paper, we instead consider the case where the agent can send feedback to the learner on whether the arm request was received, and thus the learner exactly knows which arm was played. Surprisingly, we prove that erasure feedback does not improve the worst-case regret upper bound order over the previously studied no-feedback setting. In particular, we prove a regret lower bound of $Omega(sqrt{KT} + K / (1 - epsilon))$, where $K$ is the number of arms and $T$ the time horizon, that matches no-feedback upper bounds up to logarithmic factors. We note however that the availability of feedback enables simpler algorithm designs that may achieve better constants (albeit not better order) regret bounds; we design one such algorithm and evaluate its performance numerically.
Problem

Research questions and friction points this paper is trying to address.

Study impact of feedback in bandits with arm erasures
Compare regret bounds with and without erasure feedback
Design algorithms leveraging feedback for better performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses erasure feedback in bandit algorithms
Analyzes regret bounds with feedback
Proposes simpler algorithm designs
🔎 Similar Papers
No similar papers found.
M
Merve Karakas
University of California, Los Angeles
O
Osama A. Hanna
Meta, GenAI
L
Lin F. Yang
University of California, Los Angeles
Christina Fragouli
Christina Fragouli
UCLA
Network codingwireless networksphysical layer securityalgorithms for networking