🤖 AI Summary
Real-world complex causal decision-making problems often require modeling across multiple abstraction levels, yet existing Causal Multi-Armed Bandit (CMAB) methods struggle to jointly leverage information and computational resources across these levels. To address this, we propose AT-UCB—the first CMAB algorithm integrating causal abstraction theory—where efficient exploration in a low-cost, coarse-grained causal model dynamically prunes the candidate action set for a high-fidelity, fine-grained model, enabling cross-level information transfer and policy acceleration. AT-UCB synergistically combines multi-resolution simulation environments with an upper-confidence-bound (UCB) mechanism and derives a provably tighter cumulative regret bound. Extensive experiments on an epidemiological simulation platform demonstrate that AT-UCB significantly outperforms standard UCB and state-of-the-art CMAB baselines in both sample efficiency and regret minimization.
📝 Abstract
Although real-world decision-making problems can often be encoded as causal multi-armed bandits (CMABs) at different levels of abstraction, a general methodology exploiting the information and computational advantages of each abstraction level is missing. In this paper, we propose AT-UCB, an algorithm which efficiently exploits shared information between CMAB problem instances defined at different levels of abstraction. More specifically, AT-UCB leverages causal abstraction (CA) theory to explore within a cheap-to-simulate and coarse-grained CMAB instance, before employing the traditional upper confidence bound (UCB) algorithm on a restricted set of potentially optimal actions in the CMAB of interest, leading to significant reductions in cumulative regret when compared to the classical UCB algorithm. We illustrate the advantages of AT-UCB theoretically, through a novel upper bound on the cumulative regret, and empirically, by applying AT-UCB to epidemiological simulators with varying resolution and computational cost.