Information-Theoretic Generalization Bounds for Sequential Decision Making

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
Existing information-theoretic generalization bounds are difficult to apply directly to data-adaptive sequential decision problems such as online learning, streaming active learning, and multi-armed bandits. This work proposes a sequential supersample framework that, under an exchangeability assumption across rounds, decouples the learner’s filtration from comparisons with ghost coordinates by introducing a round selector and summing information terms associated with losses. Building on this framework, we derive generalization bounds based on sequential conditional mutual information (SCMI). We innovatively design a selector–SCMI proof strategy combined with a Bernstein-type refined inequality, achieving faster convergence rates under bounded variance conditions. This approach extends information-theoretic generalization analysis—previously limited to static settings—to a broad class of sequential decision-making scenarios, yielding tight and computable upper bounds on generalization error for online learning, importance-weighted streaming active learning, and stochastic multi-armed bandits.
📝 Abstract
Information-theoretic generalization bounds based on the supersample construction are a central tool for algorithm-dependent generalization analysis in the batch i.i.d.~setting. However, existing supersample conditional mutual information (CMI) bounds do not directly apply to sequential decision-making problems such as online learning, streaming active learning, and bandits, where data are revealed adaptively and the learner evolves along a causal trajectory. To address this limitation, we develop a sequential supersample framework that separates the learner filtration from a proof-side enlargement used for ghost-coordinate comparisons. Under a row-wise exchangeability assumption, the sequential generalization gap is controlled by sequential CMI, a sum of roundwise selector--loss information terms. We also establish a Bernstein-type refinement that yields faster rates under suitable variance conditions. The selector-SCMI proof strategy applies to online learning, streaming active learning with importance weighting, and stochastic multi-armed bandits.
Problem

Research questions and friction points this paper is trying to address.

sequential decision making
generalization bounds
conditional mutual information
online learning
stochastic multi-armed bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

sequential CMI
supersample framework
information-theoretic generalization bounds
online learning
stochastic bandits
🔎 Similar Papers