CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
📝 Abstract
Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future estimates. We introduce Counterfactual-to-Interactive Reinforcement Fine-Tuning (CRAFT), an on-policy framework that formulates closed-loop post-training as proxy-residual optimization. CRAFT uses group-normalized counterfactual advantages as a dense proxy for real closed-loop advantages and aligns this proxy with the closed-loop world through grounded residual correction from interaction-critical events. To stabilize adaptation, CRAFT regularizes the online policy toward an EMA teacher via asymmetric KL self-distillation. Theoretically, CRAFT decomposes the real closed-loop policy gradient into proxy and residual terms under the same visited-state distribution, reducing residual variance with an aligned proxy while mitigating proxy bias through grounded residual approximation. Empirically, CRAFT achieves the strongest closed-loop gains on Bench2Drive across hierarchical planning, vision-language-action, and vocabulary-scoring architectures. Ablations, scaling behavior, stability analyses, and transfer results further validate the complementary roles of dense counterfactual proxy and grounded residual correction. Project page: https://currychen77.github.io/CRAFT.
Problem

Research questions and friction points this paper is trying to address.

distribution shift
closed-loop reinforcement learning
counterfactual reasoning
policy fine-tuning
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual fine-tuning
closed-loop reinforcement learning
proxy-residual optimization
asymmetric KL distillation
distribution shift
🔎 Similar Papers
2024-04-122024 IEEE Intelligent Vehicles Symposium (IV)Citations: 8
Keyu Chen
Keyu Chen
Tsinghua University
Autonomous drivingTraffic Simulation
N
Nanfei Ye
Li Auto Inc
Yida Wang
Yida Wang
Li Auto Inc.
Computer VisionMachine LearningComputer GraphicsAutonomous Driving
W
Wenchao Sun
School of Vehicle and Mobility, Tsinghua University
D
Danqi Zhao
School of Vehicle and Mobility, Tsinghua University
Hao Cheng
Hao Cheng
Tsinghua University
Safety Evaluation of AVs
S
Sifa Zheng
School of Vehicle and Mobility, Tsinghua University