Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional uncertainty sets in distributionally robust bandits are overly conservative, leading to substantial estimation bias and degraded policy performance under distributional shifts. Method: This paper proposes a causal-driven robust approach based on structural equation models (SEMs). By modeling the environmental causal structure and leveraging conditional independence tests to identify true distribution-shift variables, it dynamically constructs problem-specific, compact uncertainty sets via mathematical programming. Contribution/Results: To our knowledge, this is the first work to integrate SEMs into distributionally robust bandits, enabling interpretable and adaptive uncertainty set construction. Theoretically and empirically, the method significantly reduces estimation bias and policy learning variance under broad distributional shifts. When the SEM is correctly specified, the learned policy converges to the global optimum. Code is publicly available.

Technology Category

Application Category

📝 Abstract
Distributionally robust evaluation estimates the worst-case expected return over an uncertainty set of possible covariate and reward distributions, and distributionally robust learning finds a policy that maximizes that worst-case return across that uncertainty set. Unfortunately, current methods for distributionally robust evaluation and learning create overly conservative evaluations and policies. In this work, we propose a practical bandit evaluation and learning algorithm that tailors the uncertainty set to specific problems using mathematical programs constrained by structural equation models. Further, we show how conditional independence testing can be used to detect shifted variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations and learns lower-variance policies than traditional approaches, particularly for large shifts. Further, the SEM approach learns an optimal policy, assuming the model is sufficiently well-specified.
Problem

Research questions and friction points this paper is trying to address.

Overcoming conservatism in distributionally robust bandit evaluation
Tailoring uncertainty sets using structural equation models
Improving policy accuracy under large distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses structural equation models for uncertainty sets
Applies conditional independence testing for shifted variables
Optimizes policies via tailored mathematical programs
🔎 Similar Papers
No similar papers found.
K
Katherine Avery
College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01002
C
Chinmay Pendse
College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01002
David Jensen
David Jensen
Professor of Computer Science, University of Massachusetts Amherst
Machine LearningCausationCausal DiscoveryStatistical Relational LearningComputational Social Science