DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy

๐Ÿ“… 2025-06-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In autonomous driving, sparse and noisy human takeover dataโ€”often contaminated by non-strategic interventions (e.g., driver-initiated takeovers unrelated to policy failure)โ€”hinder efficient policy optimization. Method: This paper proposes a causal-aware reinforcement learning framework grounded in takeover cause identification. It introduces takeover causes as semantic labels into RL; employs an out-of-distribution (OOD) state estimation model to precisely identify policy-deficiency-driven takeovers; constructs a cause-guided imagination environment for attribution-driven, targeted policy improvement; and designs a causal-aware policy update mechanism to prevent over-conservatism. Contribution/Results: Evaluated on real-world robotaxi takeover data, the framework achieves significantly improved accuracy in identifying policy-relevant takeovers, enhanced generalization across semantically similar scenarios, and a 42% gain in policy iteration efficiency over baseline methods.

Technology Category

Application Category

๐Ÿ“ Abstract
With the increasing presence of automated vehicles on open roads under driver supervision, disengagement cases are becoming more prevalent. While some data-driven planning systems attempt to directly utilize these disengagement cases for policy improvement, the inherent scarcity of disengagement data (often occurring as a single instances) restricts training effectiveness. Furthermore, some disengagement data should be excluded since the disengagement may not always come from the failure of driving policies, e.g. the driver may casually intervene for a while. To this end, this work proposes disengagement-reason-augmented reinforcement learning (DRARL), which enhances driving policy improvement process according to the reason of disengagement cases. Specifically, the reason of disengagement is identified by a out-of-distribution (OOD) state estimation model. When the reason doesn't exist, the case will be identified as a casual disengagement case, which doesn't require additional policy adjustment. Otherwise, the policy can be updated under a reason-augmented imagination environment, improving the policy performance of disengagement cases with similar reasons. The method is evaluated using real-world disengagement cases collected by autonomous driving robotaxi. Experimental results demonstrate that the method accurately identifies policy-related disengagement reasons, allowing the agent to handle both original and semantically similar cases through reason-augmented training. Furthermore, the approach prevents the agent from becoming overly conservative after policy adjustments. Overall, this work provides an efficient way to improve driving policy performance with disengagement cases.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of disengagement data in autonomous driving
Identifies and filters irrelevant disengagement cases
Enhances policy training with reason-augmented imagination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses OOD state estimation for disengagement reason identification
Augments training with reason-based imagination environment
Filters casual disengagements to focus policy updates
๐Ÿ”Ž Similar Papers
No similar papers found.
Weitao Zhou
Weitao Zhou
Tsinghua University
Autonomous DrivingReinforcement Learning
B
Bo Zhang
School of Vehicle and Mobility, Tsinghua University; Didi Global
Zhong Cao
Zhong Cao
University of Michigan
Autonomous VehicleReinforcement Learning
X
Xiang Li
Lab for High Technology, Tsinghua University
Qian Cheng
Qian Cheng
University of Leeds
sustainable developmentcolour science
Chunyang Liu
Chunyang Liu
Didi Chuxing
Data MiningMarketplaceAutonomous Driving
Y
Yaqin Zhang
Institute for AI Industry Research, Tsinghua University
D
Diange Yang
School of Vehicle and Mobility, Tsinghua University