Closing the Sim2Real Performance Gap in RL

πŸ“… 2025-10-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In reinforcement learning, policies trained in simulation often suffer significant performance degradation when deployed in the real worldβ€”termed the Sim2Real gap. Existing approaches optimize simulators using proxy metrics (e.g., simulation fidelity or variability), which exhibit weak correlation with actual real-world performance. To address this, we propose a bilevel reinforcement learning framework that directly optimizes for real-world performance: the inner loop trains the policy in simulation, while the outer loop jointly adapts simulator parameters and the reward function based on real-world feedback. This eliminates reliance on imperfect proxies and enables adaptive calibration of both the dynamics model and reward structure. Theoretical analysis establishes convergence guarantees under mild assumptions, and extensive experiments across robotic control benchmarks demonstrate that our method substantially narrows the Sim2Real gap and significantly improves policy generalization to physical environments.

Technology Category

Application Category

πŸ“ Abstract
Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.
Problem

Research questions and friction points this paper is trying to address.

Closing Sim2Real performance gap in policy transfer
Addressing simulator accuracy limitations for real-world deployment
Developing bi-level RL framework for direct simulator adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly adapts simulator parameters using real performance
Uses bi-level RL framework with inner and outer loops
Optimizes simulation model and reward parameters jointly
πŸ”Ž Similar Papers
No similar papers found.