InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation

📅 2026-02-03
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing driving world models struggle to simultaneously achieve instance-level temporal consistency and spatial geometric fidelity in multi-view video generation. To this end, the authors propose an instance-aware generative framework comprising an Instance Flow Guider to preserve cross-frame instance identity consistency and a Spatial Geometric Aligner to model spatial layout and occlusion relationships. By propagating instance features across frames, enforcing geometric alignment, and leveraging procedurally generated rare hazardous scenarios from the CARLA platform, the method achieves state-of-the-art video generation quality on the nuScenes dataset and significantly enhances the evaluation capability of autonomous driving systems in safety-critical scenarios.

Technology Category

Application Category

📝 Abstract
Autonomous driving relies on robust models trained on high-quality, large-scale multi-view driving videos. While world models offer a cost-effective solution for generating realistic driving videos, they struggle to maintain instance-level temporal consistency and spatial geometric fidelity. To address these challenges, we propose InstaDrive, a novel framework that enhances driving video realism through two key advancements: (1) Instance Flow Guider, which extracts and propagates instance features across frames to enforce temporal consistency, preserving instance identity over time. (2) Spatial Geometric Aligner, which improves spatial reasoning, ensures precise instance positioning, and explicitly models occlusion hierarchies. By incorporating these instance-aware mechanisms, InstaDrive achieves state-of-the-art video generation quality and enhances downstream autonomous driving tasks on the nuScenes dataset. Additionally, we utilize CARLA's autopilot to procedurally and stochastically simulate rare but safety-critical driving scenarios across diverse maps and regions, enabling rigorous safety evaluation for autonomous systems. Our project page is https://shanpoyang654.github.io/InstaDrive/page.html.
Problem

Research questions and friction points this paper is trying to address.

temporal consistency
spatial geometric fidelity
instance-level consistency
driving video generation
world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instance-Aware World Model
Temporal Consistency
Spatial Geometric Alignment
Driving Video Generation
Occlusion Hierarchy
🔎 Similar Papers
No similar papers found.