Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning

📅 2026-01-09

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the high computational cost of sequential Bayesian optimal experimental design for inverse problems governed by partial differential equations with infinite-dimensional stochastic parameters. The authors formulate the problem as a finite-horizon Markov decision process and employ policy gradient reinforcement learning to train an amortized design policy that enables online sensor placement based on historical data. Innovatively, they integrate active subspaces with principal component analysis for dual dimensionality reduction and construct a derivative-informed latent attention neural operator (LANO) surrogate model. Additionally, an eigenvalue-based prior sample evaluation mechanism is introduced. In a contaminant source tracking task, the proposed method achieves approximately 100× speedup over high-fidelity finite element simulations, outperforms random sensor placement, and learns a physically interpretable “upstream tracking” strategy.

Technology Category

Application Category

📝 Abstract

Sequential Bayesian optimal experimental design (SBOED) for PDE-governed inverse problems is computationally challenging, especially for infinite-dimensional random field parameters. High-fidelity approaches require repeated forward and adjoint PDE solves inside nested Bayesian inversion and design loops. We formulate SBOED as a finite-horizon Markov decision process and learn an amortized design policy via policy-gradient reinforcement learning (PGRL), enabling online design selection from the experiment history without repeatedly solving an SBOED optimization problem. To make policy training and reward evaluation scalable, we combine dual dimension reduction -- active subspace projection for the parameter and principal component analysis for the state -- with an adjusted derivative-informed latent attention neural operator (LANO) surrogate that predicts both the parameter-to-solution map and its Jacobian. We use a Laplace-based D-optimality reward while noting that, in general, other expected-information-gain utilities such as KL divergence can also be used within the same framework. We further introduce an eigenvalue-based evaluation strategy that uses prior samples as proxies for maximum a posteriori (MAP) points, avoiding repeated MAP solves while retaining accurate information-gain estimates. Numerical experiments on sequential multi-sensor placement for contaminant source tracking demonstrate approximately $100\times$ speedup over high-fidelity finite element methods, improved performance over random sensor placements, and physically interpretable policies that discover an ``upstream''tracking strategy.

Problem

Research questions and friction points this paper is trying to address.

Sequential Bayesian Optimal Experimental Design

Infinite-dimensional parameters

PDE-constrained inverse problems

Computational scalability

Optimal sensor placement

Innovation

Methods, ideas, or system contributions that make the work stand out.

policy gradient reinforcement learning

amortized design policy

derivative-informed neural operator