Generative Scenario Rollouts for End-to-End Autonomous Driving

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes GeRo, a novel framework that addresses the limitations of existing end-to-end autonomous driving systems, which rely on sparse trajectory annotations and struggle to support long-horizon, multi-agent scenarios with language-guided reasoning. GeRo uniquely integrates language-conditioned generation with an autoregressive rolling policy, jointly generating future latent representations and textual responses from multi-view images, scene descriptions, and ego-vehicle actions to enable temporally consistent and language-aligned multi-step reasoning. A rolling consistency loss is introduced to mitigate prediction drift, enhancing zero-shot robustness and interpretability. Evaluated on Bench2Drive, GeRo achieves state-of-the-art performance in both closed-loop and open-loop tasks, improving driving scores by 15.7% and success rates by 26.2%.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models are emerging as highly effective planning models for end-to-end autonomous driving systems. However, current works mostly rely on imitation learning from sparse trajectory annotations and under-utilize their potential as generative models. We propose Generative Scenario Rollouts (GeRo), a plug-and-play framework for VLA models that jointly performs planning and generation of language-grounded future traffic scenes through an autoregressive rollout strategy. First, a VLA model is trained to encode ego vehicle and agent dynamics into latent tokens under supervision from planning, motion, and language tasks, facilitating text-aligned generation. Next, GeRo performs language-conditioned autoregressive generation. Given multi-view images, a scenario description, and ego-action questions, it generates future latent tokens and textual responses to guide long-horizon rollouts. A rollout-consistency loss stabilizes predictions using ground truth or pseudo-labels, mitigating drift and preserving text-action alignment. This design enables GeRo to perform temporally consistent, language-grounded rollouts that support long-horizon reasoning and multi-agent planning. On Bench2Drive, GeRo improves driving score and success rate by +15.7 and +26.2, respectively. By integrating reinforcement learning with generative rollouts, GeRo achieves state-of-the-art closed-loop and open-loop performance, demonstrating strong zero-shot robustness. These results highlight the promise of generative, language-conditioned reasoning as a foundation for safer and more interpretable end-to-end autonomous driving.
Problem

Research questions and friction points this paper is trying to address.

autonomous driving
generative models
language-conditioned planning
long-horizon reasoning
multi-agent planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Scenario Rollouts
Vision-Language-Action Models
Autoregressive Rollout
Language-Grounded Planning
Rollout-Consistency Loss
🔎 Similar Papers
No similar papers found.
R
R. Yasarla
Qualcomm AI Research
D
Deepti Hegde
Qualcomm AI Research
Shizhong Han
Shizhong Han
Johns Hopkins
Psychiatric Geneticsgenetic epidemiologybioinformatics
H
Hsin-Pai Cheng
Qualcomm AI Research
Yunxiao Shi
Yunxiao Shi
Qualcomm AI Research
computer visiondeep learning
M
Meysam Sadeghigooghari
Qualcomm Technologies, Inc.
Shweta Mahajan
Shweta Mahajan
University of British Columbia, Vector Institute, TU Darmstadt
Machine LearningComputer Vision
Apratim Bhattacharyya
Apratim Bhattacharyya
Qualcomm AI Research
Computer VisionMachine Learning
L
Litian Liu
Qualcomm AI Research
Risheek Garrepalli
Risheek Garrepalli
Qualcomm AI Research
Machine LearningGenerative ModelsReinforcement LearningComputer Vision
T
Thomas Svantesson
Qualcomm Technologies, Inc.
F
F. Porikli
Qualcomm AI Research
H
Hong Cai
Qualcomm AI Research