Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Conventional uncertainty sets in distributionally robust bandits are overly conservative, leading to substantial estimation bias and degraded policy performance under distributional shifts. Method: This paper proposes a causal-driven robust approach based on structural equation models (SEMs). By modeling the environmental causal structure and leveraging conditional independence tests to identify true distribution-shift variables, it dynamically constructs problem-specific, compact uncertainty sets via mathematical programming. Contribution/Results: To our knowledge, this is the first work to integrate SEMs into distributionally robust bandits, enabling interpretable and adaptive uncertainty set construction. Theoretically and empirically, the method significantly reduces estimation bias and policy learning variance under broad distributional shifts. When the SEM is correctly specified, the learned policy converges to the global optimum. Code is publicly available.

Technology Category

Application Category

📝 Abstract

Distributionally robust evaluation estimates the worst-case expected return over an uncertainty set of possible covariate and reward distributions, and distributionally robust learning finds a policy that maximizes that worst-case return across that uncertainty set. Unfortunately, current methods for distributionally robust evaluation and learning create overly conservative evaluations and policies. In this work, we propose a practical bandit evaluation and learning algorithm that tailors the uncertainty set to specific problems using mathematical programs constrained by structural equation models. Further, we show how conditional independence testing can be used to detect shifted variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations and learns lower-variance policies than traditional approaches, particularly for large shifts. Further, the SEM approach learns an optimal policy, assuming the model is sufficiently well-specified.

Problem

Research questions and friction points this paper is trying to address.

Overcoming conservatism in distributionally robust bandit evaluation

Tailoring uncertainty sets using structural equation models

Improving policy accuracy under large distribution shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses structural equation models for uncertainty sets

Applies conditional independence testing for shifted variables

Optimizes policies via tailored mathematical programs

🔎 Similar Papers

RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions