🤖 AI Summary
This work addresses the challenge of balancing high-fidelity simulation and scalable reinforcement learning in autonomous driving control policy learning. The authors propose Sim2Sim2Sim, a novel framework that leverages dynamics distillation to construct an intermediate simulation environment tailored for reinforcement learning. Specifically, a highly parallelizable learned dynamics model is distilled from a high-fidelity simulator, enabling efficient policy training; the resulting policies are then transferred back to the original environment. Crucially, the framework evaluates the effectiveness of the distilled dynamics not solely by prediction accuracy but by the quality of the policies it enables. This approach significantly improves policy optimization efficiency and enhances the reliability of cross-simulation transfer under complex vehicle dynamics.
📝 Abstract
Robust control policy learning for autonomous driving requires training environments to be both physically realistic and computationally scalable, properties that existing simulators provide only in isolation. We introduce Sim2Sim2Sim, a framework that bridges high-fidelity vehicle simulation and scalable reinforcement learning by distilling simulator dynamics into a highly parallelizable learned dynamics model. By training control policies purely within this distilled environment and deploying them back into the high-fidelity source simulator, we demonstrate more efficient policy optimization and reliable transfer under challenging dynamics. We further show that predictive accuracy alone does not fully characterize a learned dynamics model's suitability as a reinforcement learning training environment, which should also be assessed by the quality of the policies it enables.