🤖 AI Summary
To address unreliable simulation-based inference (SBI) caused by simulator model misspecification, this paper proposes the Robust Posterior Estimation (RoPE) framework, which calibrates simulation-based inference using a small set of real calibration data. Methodologically, RoPE formulates the misspecification gap between simulator and real data as an optimal transport (OT) problem—without assuming any parametric form of the misspecification—a novel departure from prior work. It integrates deep generative modeling with data-driven calibration to enable controllable trade-offs between calibration fidelity and inferential informativeness. Evaluated on four synthetic tasks and two real-world problems, RoPE consistently outperforms existing baselines, yielding posterior distributions that are both highly informative and probabilistically calibrated—demonstrating robustness to simulator imperfections while preserving statistical reliability.
📝 Abstract
Driven by steady progress in deep generative modeling, simulation-based inference (SBI) has emerged as the workhorse for inferring the parameters of stochastic simulators. However, recent work has demonstrated that model misspecification can compromise the reliability of SBI, preventing its adoption in important applications where only misspecified simulators are available. This work introduces robust posterior estimation~(RoPE), a framework that overcomes model misspecification with a small real-world calibration set of ground-truth parameter measurements. We formalize the misspecification gap as the solution of an optimal transport~(OT) problem between learned representations of real-world and simulated observations, allowing RoPE to learn a model of the misspecification without placing additional assumptions on its nature. RoPE demonstrates how OT and a calibration set provide a controllable balance between calibrated uncertainty and informative inference, even under severely misspecified simulators. Results on four synthetic tasks and two real-world problems with ground-truth labels demonstrate that RoPE outperforms baselines and consistently returns informative and calibrated credible intervals.