GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenges of end-to-end autonomous driving policies, which often converge to suboptimal solutions due to sparse reward signals and face practical deployment barriers stemming from high annotation costs and data degradation in real-world settings. To overcome these limitations, the authors propose a novel framework that synergistically integrates imitation learning and reinforcement learning. For the first time in autonomous driving, they incorporate 3D Gaussian Splatting (3DGS) to construct a differentiable physical environment and introduce a multimodal trajectory probe to generate dense, immediate reward signals. This design enables bidirectional knowledge transfer between imitation and reinforcement learning, effectively circumventing the sparse reward bottleneck. Closed-loop experiments on the nuScenes reconstruction dataset demonstrate that the proposed method significantly outperforms existing simulation-based reinforcement learning approaches for driving policy learning.

📝 Abstract

End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.

Problem

Research questions and friction points this paper is trying to address.

end-to-end autonomous driving

reinforcement learning

sparse rewards

imitation learning

policy optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

multi-mode trajectory probing

differentiable simulation