Mind the Sim-to-Real Gap & Think Like a Scientist

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the sim-to-real gap arising from calibration data confusion and drift in pretrained simulators by investigating how to effectively integrate cheap but biased simulations with expensive yet unbiased real-world experiments in sequential decision-making. The authors extend the simulation lemma to decompose policy value error, revealing that under passive learning, the reachability gap is irreducible. To overcome this limitation, they propose Fisher-SEP, an active experimentation strategy based on the Fisher information matrix, which minimizes the predictive variance of the target policy’s value through Bayesian posterior inference. Empirical validation on two real-world domains—vending machine supply chains and mobile HIV testing—demonstrates that early real-world trials yield substantial long-term benefits in the former, while only active exploration effectively covers low-monitoring regions in the latter.

📝 Abstract

Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.

Problem

Research questions and friction points this paper is trying to address.

sim-to-real gap

sequential decision making

simulation bias

experimental design

policy evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

sim-to-real gap

simulation-aided experimental policy

Fisher-SEP