Optimal Stopping for Sequential Bayesian Experimental Design

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In sequential Bayesian experimental design, pre-specifying a fixed number of experiments fails to accommodate dynamic real-world requirements, necessitating a principled solution to the fundamental “optimal stopping” problem. This paper pioneers the integration of optimal stopping theory into this framework by formulating a Markov decision process that jointly optimizes stopping policies and experimental designs; we prove that the optimal stopping rule balances immediate reward against the expected value of continuing. To address policy circular dependency during training, we propose a curriculum learning strategy to enhance convergence stability. Our method unifies Bayesian inference, policy gradient optimization, and curriculum learning. Empirical evaluation on linear-Gaussian benchmark tasks and contaminant source localization demonstrates significant improvements in estimation accuracy and sampling efficiency over baseline methods (e.g., fixed-threshold rules), especially under strong sequential dependencies.

Technology Category

Application Category

📝 Abstract
In sequential Bayesian experimental design, the number of experiments is usually fixed in advance. In practice, however, campaigns may terminate early, raising the fundamental question: when should one stop? Threshold-based rules are simple to implement but inherently myopic, as they trigger termination based on a fixed criterion while ignoring the expected future information gain that additional experiments might provide. We develop a principled Bayesian framework for optimal stopping in sequential experimental design, formulated as a Markov decision process where stopping and design policies are jointly optimized. We prove that the optimal rule is to stop precisely when the immediate terminal reward outweighs the expected continuation value. To learn such policies, we introduce a policy gradient method, but show that naïve joint optimization suffers from circular dependencies that destabilize training. We resolve this with a curriculum learning strategy that gradually transitions from forced continuation to adaptive stopping. Numerical studies on a linear-Gaussian benchmark and a contaminant source detection problem demonstrate that curriculum learning achieves stable convergence and outperforms vanilla methods, particularly in settings with strong sequential dependencies.
Problem

Research questions and friction points this paper is trying to address.

Determining optimal stopping time in sequential experiments
Addressing myopic limitations of threshold-based stopping rules
Developing stable policy optimization for joint stopping-design decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal stopping formulated as Markov decision process
Policy gradient method with curriculum learning strategy
Gradual transition from forced continuation to adaptive stopping
🔎 Similar Papers
No similar papers found.