🤖 AI Summary
Conventional uncertainty sets in distributionally robust bandits are overly conservative, leading to substantial estimation bias and degraded policy performance under distributional shifts.
Method: This paper proposes a causal-driven robust approach based on structural equation models (SEMs). By modeling the environmental causal structure and leveraging conditional independence tests to identify true distribution-shift variables, it dynamically constructs problem-specific, compact uncertainty sets via mathematical programming.
Contribution/Results: To our knowledge, this is the first work to integrate SEMs into distributionally robust bandits, enabling interpretable and adaptive uncertainty set construction. Theoretically and empirically, the method significantly reduces estimation bias and policy learning variance under broad distributional shifts. When the SEM is correctly specified, the learned policy converges to the global optimum. Code is publicly available.
📝 Abstract
Distributionally robust evaluation estimates the worst-case expected return over an uncertainty set of possible covariate and reward distributions, and distributionally robust learning finds a policy that maximizes that worst-case return across that uncertainty set. Unfortunately, current methods for distributionally robust evaluation and learning create overly conservative evaluations and policies. In this work, we propose a practical bandit evaluation and learning algorithm that tailors the uncertainty set to specific problems using mathematical programs constrained by structural equation models. Further, we show how conditional independence testing can be used to detect shifted variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations and learns lower-variance policies than traditional approaches, particularly for large shifts. Further, the SEM approach learns an optimal policy, assuming the model is sufficiently well-specified.