🤖 AI Summary
End-to-end autonomous driving faces two key challenges: (1) imitation learning (IL) suffers from poor generalization and lacks post-deployment error correction; (2) reinforcement learning (RL) tends to overfit to hard scenarios, causing catastrophic forgetting and low sample efficiency. To address these, we propose a “Reinforcement Fine-tuning + Self-Aware Expansion” framework, introducing a novel three-stage closed-loop paradigm: (i) general IL pre-training, (ii) residual RL fine-tuning focused exclusively on failure cases, and (iii) plug-and-play self-aware adapters for dynamic, scenario-specific adaptation. This design preserves holistic driving knowledge while enabling targeted optimization for challenging scenarios. Evaluated in both closed-loop simulation and real-world vehicle testing, our method significantly improves long-horizon planning robustness, safety, and cross-scenario generalization—outperforming state-of-the-art end-to-end approaches.
📝 Abstract
End-to-end autonomous driving has emerged as a promising paradigm for directly mapping sensor inputs to planning maneuvers using learning-based modular integrations. However, existing imitation learning (IL)-based models suffer from generalization to hard cases, and a lack of corrective feedback loop under post-deployment. While reinforcement learning (RL) offers a potential solution to tackle hard cases with optimality, it is often hindered by overfitting to specific driving cases, resulting in catastrophic forgetting of generalizable knowledge and sample inefficiency. To overcome these challenges, we propose Reinforced Refinement with Self-aware Expansion (R2SE), a novel learning pipeline that constantly refines hard domain while keeping generalizable driving policy for model-agnostic end-to-end driving systems. Through reinforcement fine-tuning and policy expansion that facilitates continuous improvement, R2SE features three key components: 1) Generalist Pretraining with hard-case allocation trains a generalist imitation learning (IL) driving system while dynamically identifying failure-prone cases for targeted refinement; 2) Residual Reinforced Specialist Fine-tuning optimizes residual corrections using reinforcement learning (RL) to improve performance in hard case domain while preserving global driving knowledge; 3) Self-aware Adapter Expansion dynamically integrates specialist policies back into the generalist model, enhancing continuous performance improvement. Experimental results in closed-loop simulation and real-world datasets demonstrate improvements in generalization, safety, and long-horizon policy robustness over state-of-the-art E2E systems, highlighting the effectiveness of reinforce refinement for scalable autonomous driving.