Invariance-Based Dynamic Regret Minimization

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in stochastic non-stationary linear bandits where time-varying parameters cause conventional methods to prematurely discard historical data, thereby losing valuable information. To mitigate this issue, the authors propose decomposing the reward model into stationary and non-stationary components and introduce invariance modeling—leveraging stable structures within historical data to reduce the effective problem dimensionality—within a dynamic regret minimization framework. The proposed ISD-linUCB algorithm integrates contextual linear modeling, dynamic weighting, and invariance identification to enable efficient online decision-making in non-stationary environments. Both theoretical analysis and empirical experiments demonstrate that, particularly in rapidly changing settings with sufficient historical data, the method achieves significantly lower dynamic regret compared to existing approaches.

Technology Category

Application Category

📝 Abstract
We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past data, effectively shrinking the time horizon over which learning can occur. However, in many settings historical data may still carry partial information about the reward model. We propose to leverage such data while adapting to changes, by assuming the reward model decomposes into stationary and non-stationary components. Based on this assumption, we introduce ISD-linUCB, an algorithm that uses past data to learn invariances in the reward model and subsequently exploits them to improve online performance. We show both theoretically and empirically that leveraging invariance reduces the problem dimensionality, yielding significant regret improvements in fast-changing environments when sufficient historical data is available.
Problem

Research questions and friction points this paper is trying to address.

non-stationary linear bandits
dynamic regret
invariance
historical data
reward model
Innovation

Methods, ideas, or system contributions that make the work stand out.

invariance
non-stationary bandits
dynamic regret
dimensionality reduction
ISD-linUCB
🔎 Similar Papers
No similar papers found.