HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation

๐Ÿ“… 2026-05-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

218K/year
๐Ÿค– AI Summary
Existing autoregressive driving world models suffer from teacher forcing drift, limited supervision horizons, and high memory overhead in long-horizon closed-loop simulation. This work proposes a drift-resistant training and distillation framework that endows the teacher model with autoregressive rollback and self-correction capabilities for the first time, overcoming the single forward-pass length constraint and enabling minute-scale stable simulation under limited memory. By integrating Scheduled Rollback Recovery (SRR) training and Teacher Rollback Distribution-matching Distillation (TRD), the approach effectively combines temporal extension with distribution alignment. On nuScenes, it reduces FID and FVD by 52% and 37%, respectively, and decreases ARE and DTW by 21% and 9%, significantly outperforming current streaming baselines while maintaining performance comparable to single-pass generative models.
๐Ÿ“ Abstract
Closed-loop driving simulation requires real-time interaction beyond short offline clips, pushing current driving world models toward autoregressive (AR) rollout. Existing AR distillation approaches typically rely on frame sinks or student-side degradation training. The former transfers poorly to driving due to fast ego-motion and rapid scene changes, while the latter remains bounded by the teacher's single-pass output length and thus provides only a limited supervision horizon. A natural question is: can the teacher itself be extended via AR rollout to provide unbounded-horizon supervision at bounded memory cost? The key difficulty is that a standard teacher drifts under its own predictions, contaminating the supervision it provides. Our key insight is to make the teacher rollout-capable, ensuring reliable supervision from its own AR rollouts. This is instantiated as HorizonDrive, an anti-drifting training-and-distillation framework for AR driving simulation. First, scheduled rollout recovery (SRR) trains the base model to reconstruct ground-truth future clips from prediction-corrupted histories, yielding a teacher that remains stable across long AR rollouts. Second, the rollout-capable teacher is extended via AR rollout, providing long-horizon distribution-matching supervision under bounded memory, while a short-window student aligns to it with teacher rollout DMD (TRD) for efficient real-time deployment. HorizonDrive natively supports minute-scale AR rollout under bounded memory; on nuScenes, HorizonDrive reduces FID by 52% and FVD by 37%, and lowers ARE and DTW by 21% and 9% relative to the strongest long-horizon streaming baselines, while remaining competitive with single-pass driving video generators.
Problem

Research questions and friction points this paper is trying to address.

autoregressive world model
long-horizon driving simulation
teacher drift
closed-loop simulation
supervision horizon
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive world model
self-corrective training
long-horizon simulation
teacher-student distillation
driving simulation
๐Ÿ”Ž Similar Papers
No similar papers found.