π€ AI Summary
Model mismatch in sim-to-real transfer severely hinders deployment of robotic policies trained in simulation. Method: This paper proposes the Real-Sim-Real (RSR) closed-loop framework, which jointly optimizes policy and simulator parameters via gradient-based online system identification using differentiable simulation (MuJoCo MJX), dynamically aligning simulated dynamics with real-world behavior. An information-theoretic cost function guides active, diverse real-world data collection to maximize parameter identifiability. Unlike conventional unidirectional sim-to-real pipelines, RSR establishes an iterative βreal β sim β realβ optimization loop, tightly coupling reinforcement learning (PPO/SAC) with online system identification. Contribution/Results: RSR significantly reduces the sim-to-real performance gap across diverse manipulation tasks. It demonstrates strong robustness to both explicit (e.g., mass, friction) and implicit (e.g., unmodeled contact dynamics) environmental uncertainties, and exhibits superior cross-scenario generalization without task-specific retraining.
π Abstract
The sim-to-real gap remains a critical challenge in robotics, hindering the deployment of algorithms trained in simulation to real-world systems. This paper introduces a novel Real-Sim-Real (RSR) loop framework leveraging differentiable simulation to address this gap by iteratively refining simulation parameters, aligning them with real-world conditions, and enabling robust and efficient policy transfer. A key contribution of our work is the design of an informative cost function that encourages the collection of diverse and representative real-world data, minimizing bias and maximizing the utility of each data point for simulation refinement. This cost function integrates seamlessly into existing reinforcement learning algorithms (e.g., PPO, SAC) and ensures a balanced exploration of critical regions in the real domain. Furthermore, our approach is implemented on the versatile Mujoco MJX platform, and our framework is compatible with a wide range of robotic systems. Experimental results on several robotic manipulation tasks demonstrate that our method significantly reduces the sim-to-real gap, achieving high task performance and generalizability across diverse scenarios of both explicit and implicit environmental uncertainties.