Adaptive Policy Learning Under Unknown Network Interference

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This study addresses adaptive experimentation under unknown network interference, where the goal is to simultaneously infer the interference structure among individuals and optimize individualized intervention policies to maximize cumulative reward. The authors propose a novel algorithm that integrates Thompson sampling with Gibbs sampling, enabling, for the first time, joint learning of the interference network and the optimal intervention policy while producing an interpretable network structure suitable for estimating direct, indirect, and total causal effects. Theoretical analysis establishes both a Bayesian regret upper bound and an information-theoretic lower bound. Under an additive spillover model, the algorithm achieves the optimal regret rate and exhibits sublinear regret on real-world networks, reducing cumulative regret by over an order of magnitude compared to baseline methods and yielding substantially lower RMSE in downstream effect estimation.
📝 Abstract
Adaptive experimentation under unknown network interference requires solving two coupled problems: (i) learning the underlying dynamics of interference among units and (ii) using these dynamics to inform treatment allocation in order to maximize a cumulative outcome of interest (e.g. revenue). Existing adaptive experimentation methods either assume the interference network is fully known or bypass the network by operating on coarse cluster-level randomizations. We develop a Thompson sampling algorithm that jointly learns the interference network and adaptively optimizes individual-level treatment allocations via a Gibbs sampler. The algorithm returns both an optimized treatment policy and an estimate of the interference network; the latter supports downstream causal analyses such as estimation of direct, indirect, and total treatment effects. For additive spillover models, we show that total reward is linear in the treatment vector with coefficients given by an $n$-dimensional latent score. We prove a Bayesian regret bound of order $\sqrt{nT \cdot B \log(en/B)}$ for exact posterior sampling; empirically, our Gibbs-based approximate sampler achieves regret consistent with this rate and remains sublinear when the additive spillovers assumption is violated. For general Neighborhood Interference, where this reduction is unavailable, we analyze an explore-then-commit variant with $O(n^2 \log T)$ graph-discovery cost. An information-theoretic $Ω(n \log T)$ lower bound complements both results. Empirically, our method achieves more than an order-of-magnitude reduction in regret in head-to-head comparisons. On two real-world networks, the algorithm achieves sublinear regret and yields downstream effect estimates with small RMSE relative to the truth.
Problem

Research questions and friction points this paper is trying to address.

network interference
adaptive experimentation
treatment allocation
causal inference
Thompson sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson Sampling
Network Interference
Adaptive Experimentation
Causal Inference
Gibbs Sampler