CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

πŸ“… 2025-10-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
End-to-end autonomous driving models relying solely on imitation learning (IL) suffer from poor generalization, while pure reinforcement learning (RL) approaches face low sample efficiency and unstable convergence. To address these limitations, we propose a latent-variable world model–driven collaborative competitive dual-policy framework that abandons the conventional two-stage paradigm of IL pretraining followed by RL fine-tuning. Instead, it enables dynamic knowledge exchange and gradient-decoupled optimization between IL and RL agents within a unified architecture. Our key innovation is a competitive policy mechanism operating in the latent space, jointly modeling behavioral cloning and reward-driven exploration. Extensive end-to-end experiments on the nuScenes dataset demonstrate a 18% reduction in collision rate, along with significantly improved generalization to long-tail, complex driving scenarios and enhanced overall driving performance.

Technology Category

Application Category

πŸ“ Abstract
End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
Problem

Research questions and friction points this paper is trying to address.

Combining imitation and reinforcement learning for autonomous driving
Addressing poor generalization in end-to-end driving models
Resolving sample inefficiency and unstable convergence in RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines imitation and reinforcement learning interactively
Uses competitive dual-policy framework for knowledge exchange
Prevents gradient conflicts through competition-based mechanism
X
Xiaoji Zheng
Tsinghua University
Ziyuan Yang
Ziyuan Yang
The Chinese University of Hong Kong
CVMedical ImagingSecurity & PrivacyEfficient Learning
Yanhao Chen
Yanhao Chen
Beijing Jiaotong University
Y
Yuhang Peng
The Hong Kong Polytechnic University
Y
Yuanrong Tang
Tsinghua University
G
Gengyuan Liu
Tsinghua University
B
Bokui Chen
Tsinghua University
Jiangtao Gong
Jiangtao Gong
Institute for AI Industry Research (AIR), Tsinghua University
Human-Computer InteractionHuman-AI CollaborationRoboticsMixed Reality