π€ AI Summary
Existing supervised open-loop training struggles to model dynamic multi-agent interactions in complex driving scenarios, resulting in simulations that lack realism and controllability. This work proposes a reinforcement learningβbased fine-tuning framework that builds upon a pretrained traffic simulation model and introduces a low-variance dense reward function balancing fidelity and controllability. For the first time, this approach aligns simulated trajectories with real-world traffic distributions while enabling target-conditioned, controllable scenario generation. Evaluated on the Waymo Open Motion Dataset, the method achieves state-of-the-art simulation realism, significantly reduces sample consumption compared to heuristic fine-tuning, and enables efficient, controllable multi-agent traffic simulation.
π Abstract
Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling goal-conditioned controllability in scenario generation. We instantiate RLFTSim on top of a pre-trained simulation model, design a reward that balances fidelity and controllability, and perform comprehensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism, achieving state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal, and it directly addresses the realism alignment issue by design. We also demonstrate the effectiveness of our approach for distilling traffic simulation controllability through goal conditioning. The project page is available at https://ehsan-ami.github.io/rlftsim.