On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work investigates the minimization of satisficing regret in non-stationary multi-armed bandits, revealing that even with only two environment changes—i.e., piecewise-stationary settings with at least $L \geq 2$ segments—the optimal satisficing regret must grow with the time horizon $T$ and cannot remain bounded by a constant. To establish this result, we introduce a novel information-theoretic analysis framework based on Fano’s inequality, featuring a “post-interaction reference” construction that rigorously generalizes both classical and existing interactive Fano methods to accommodate non-stationary feedback structures. Our theoretical analysis shows that the optimal satisficing regret scales as $\Theta(L \log T)$ for $L \geq 2$, in stark contrast to the $\Theta(1)$ bound achievable when $L = 1$, and further identifies specific conditions under which constant regret can be recovered.

Technology Category

Application Category

📝 Abstract

Motivated by the principle of satisficing in decision-making, we study satisficing regret guarantees for nonstationary $K$-armed bandits. We show that in the general realizable, piecewise-stationary setting with $L$ stationary segments, the optimal regret is $Θ(L\log T)$ as long as $L\geq 2$. This stands in sharp contrast to the case of $L=1$ (i.e., the stationary setting), where a $T$-independent $Θ(1)$ satisficing regret is achievable under realizability. In other words, the optimal regret has to scale with $T$ even if just a little nonstationarity presents. A key ingredient in our analysis is a novel Fano-based framework tailored to nonstationary bandits via a \emph{post-interaction reference} construction. This framework strictly extends the classical Fano method for passive estimation as well as recent interactive Fano techniques for stationary bandits. As a complement, we also discuss a special regime in which constant satisficing regret is again possible.

Problem

Research questions and friction points this paper is trying to address.

nonstationarity

satisficing regret

multi-armed bandits

piecewise-stationary

regret minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

satisficing regret

nonstationary bandits

Fano-based framework