Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient generalization in supervised learning for multi-agent traffic simulation—caused by train-test distribution shift—this paper proposes SMART-R1, the first reinforcement fine-tuning (RFT) framework inspired by the R1 paradigm. SMART-R1 integrates metric-guided policy optimization with an alternating supervised fine-tuning (SFT) and RFT training scheme, leveraging next-token prediction models, SFT, RFT, and preference alignment techniques to achieve high-fidelity, scalable driving behavior modeling on large-scale trajectory datasets. Experiments demonstrate that SMART-R1 achieves a state-of-the-art composite realism meta-score of 0.7858 on the Waymo Open Sim Agents Challenge, ranking first overall. It significantly improves alignment between simulated agent distributions and human driving preferences, establishing new performance benchmarks in realistic traffic simulation.

Technology Category

Application Category

📝 Abstract
Scalable and realistic simulation of multi-agent traffic behavior is critical for advancing autonomous driving technologies. Although existing data-driven simulators have made significant strides in this domain, they predominantly rely on supervised learning to align simulated distributions with real-world driving scenarios. A persistent challenge, however, lies in the distributional shift that arises between training and testing, which often undermines model generalization in unseen environments. To address this limitation, we propose SMART-R1, a novel R1-style reinforcement fine-tuning paradigm tailored for next-token prediction models to better align agent behavior with human preferences and evaluation metrics. Our approach introduces a metric-oriented policy optimization algorithm to improve distribution alignment and an iterative "SFT-RFT-SFT" training strategy that alternates between Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) to maximize performance gains. Extensive experiments on the large-scale Waymo Open Motion Dataset (WOMD) validate the effectiveness of this simple yet powerful R1-style training framework in enhancing foundation models. The results on the Waymo Open Sim Agents Challenge (WOSAC) showcase that SMART-R1 achieves state-of-the-art performance with an overall realism meta score of 0.7858, ranking first on the leaderboard at the time of submission.
Problem

Research questions and friction points this paper is trying to address.

Addresses distribution shift in multi-agent traffic simulation
Improves model generalization for unseen driving environments
Aligns simulated agent behavior with human preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

R1-style reinforcement fine-tuning for next-token prediction
Metric-oriented policy optimization for distribution alignment
Iterative SFT-RFT-SFT training strategy alternating methods
🔎 Similar Papers
No similar papers found.