SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the limited generalization of end-to-end autonomous driving systems in long-tail, safety-critical scenarios, where existing vision-language-action (VLA) approaches rely solely on positive expert demonstrations and lack explicit modeling of risky behaviors. To overcome this limitation, we propose SafeAlign-VLA, a novel framework that pioneers the integration of negative samples into VLA alignment training. Our method leverages counterfactual reasoning to generate structured safety labels alongside positive trajectories and introduces an anchor-driven group relative advantage policy optimization mechanism. Employing a two-stage training strategy, SafeAlign-VLA achieves a PDMS score of 89.1 (+1.3%) on NAVSIM v1, reduces collision rates to 3.36% on DeepAccident, and attains language and risk prediction accuracies of 84.2% and 85.8%, respectively.

📝 Abstract

End-to-end autonomous driving systems excel in common scenarios but struggle with safety-critical long-tail cases. Vision-Language-Action (VLA) models are promising due to their strong reasoning capabilities. However, most VLA-based approaches rely on positive expert demonstrations, rarely exploiting negative samples, leading to insufficient understanding of risky behaviors and safety boundaries. To address this limitation, we propose SafeAlign-VLA, a unified negative-enhanced safe alignment framework that incorporates negative data into supervised learning and reinforcement learning. First, we develop a counterfactual safety pairing paradigm to generate structured safety labels and counterfactual positive trajectories from risky scenarios via counterfactual reasoning. Then, a two-stage training strategy is adopted: negative-enhanced supervised fine-tuning for failure feedback and trajectory correction, followed by anchor-based group relative policy optimization that uses positive and negative trajectories as contrastive anchors to steer sampling and penalize high-risk behaviors via group-relative advantages. Experiments on NAVSIM and DeepAccident validate the proposed framework. SafeAlign-VLA achieves 89.1 PDMS on the NAVSIM v1 testset, improving over the baseline without negative data by 1.3%. On DeepAccident, it reduces the collision rate to 3.36%, while achieving 84.2% language accuracy and 85.8% risk prediction accuracy. These results demonstrate the effectiveness of the proposed negative-enhanced safe alignment framework for safe and robust autonomous driving.

Problem

Research questions and friction points this paper is trying to address.

autonomous driving

Vision-Language-Action

negative samples

safety-critical

long-tail scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

negative-enhanced learning

counterfactual reasoning

Vision-Language-Action (VLA)