Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient generalization in safety-critical long-tail driving scenarios—characterized by sparse supervision and weak causal understanding—this paper proposes an end-to-end imitation learning framework integrating causal reasoning with trajectory planning. Our contributions are threefold: (1) We introduce CoC, the first vision-language-action dataset explicitly designed for causal driving understanding; (2) We propose a modular architecture coupling the Cosmos-Reason vision-language model with a diffusion-based trajectory decoder; and (3) We adopt a multi-stage training strategy jointly optimizing supervised fine-tuning and reinforcement learning to align causal reasoning with behavior execution. Experiments demonstrate significant improvements: +12% planning accuracy, −35% road departure rate in closed-loop simulation, −25% near-miss incidents, and +45% causal reasoning quality—substantially enhancing safety and robustness in long-tail scenarios.

Technology Category

Application Category

📝 Abstract
End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. To address this, we introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning to enhance decision-making in complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a Vision-Language Model pre-trained for Physical AI applications, with a diffusion-based trajectory decoder that generates dynamically feasible plans in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to optimize reasoning quality via large reasoning model feedback and enforce reasoning-action consistency. Evaluation shows AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in off-road rate and 25% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% as measured by a large reasoning model critic and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. We plan to release AR1 models and a subset of the CoC in a future update.
Problem

Research questions and friction points this paper is trying to address.

Enhancing decision-making in complex autonomous driving scenarios
Addressing brittleness in safety-critical long-tail driving situations
Bridging causal reasoning with real-time trajectory planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Chain of Causation reasoning with trajectory planning
Combines Vision-Language Model with diffusion-based trajectory decoder
Uses multi-stage training with fine-tuning and reinforcement learning
🔎 Similar Papers
No similar papers found.
Y
Yan Wang
NVIDIA
Wenjie Luo
Wenjie Luo
Nanyang Technological University
AIoT
J
Junjie Bai
NVIDIA
Yulong Cao
Yulong Cao
Research Scientist, NVIDIA Research; Ph.D. Umich
Trustworth AISystem SecurityCPS Security
T
Tong Che
NVIDIA
K
Ke Chen
NVIDIA
Y
Yuxiao Chen
NVIDIA
J
Jenna Diamond
NVIDIA
Y
Yifan Ding
NVIDIA
Wenhao Ding
Wenhao Ding
Research Scientist, NVIDIA Research
ReasoningSafetyReinforcement LearningSimulation
L
Liang Feng
NVIDIA
Greg Heinrich
Greg Heinrich
NVIDIA
J
Jack Huang
NVIDIA
Peter Karkus
Peter Karkus
Research Scientist, NVIDIA Research
RoboticsArtificial IntelligenceMachine LearningAutonomous Vehicles
B
Boyi Li
NVIDIA
P
Pinyi Li
NVIDIA
Tsung-Yi Lin
Tsung-Yi Lin
Research Scientist, NVIDIA
Computer VisionMachine Learning
D
Dongran Liu
NVIDIA
M
Ming-Yu Liu
NVIDIA
L
Langechuan Liu
NVIDIA
Z
Zhijian Liu
NVIDIA
J
Jason Lu
NVIDIA
Y
Yunxiang Mao
NVIDIA