MADrive: Memory-Augmented Driving Scene Modeling

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Existing autonomous driving scene reconstruction methods heavily rely on original observations, limiting photorealistic synthesis under substantial edits or novel configurations. This paper introduces a memory-augmented scene modeling paradigm: leveraging an external large-scale 360° vehicle video corpus (MAD-Cars, comprising 70K in-the-wild videos) to enable cross-video semantic alignment, monocular video-to-3D asset reconstruction, orientation-consistent vehicle replacement, and physics-inspired relighting. Our approach integrates 3D Gaussian splatting reconstruction with joint appearance-pose retrieval, enabling controllable, multi-view-consistent editing of vehicle type, pose, and illumination across diverse scenes. Qualitative and quantitative evaluations demonstrate significant improvements over baselines. To our knowledge, this is the first method achieving controllable, photorealistic driving scene synthesis directly driven by in-the-wild videos.

Technology Category

Application Category

📝 Abstract

Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting. However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive, a memory-augmented reconstruction framework designed to extend the capabilities of existing scene reconstruction methods by replacing observed vehicles with visually similar 3D assets retrieved from a large-scale external memory bank. Specifically, we release MAD-Cars, a curated dataset of ${sim}70$K 360° car videos captured in the wild and present a retrieval module that finds the most similar car instances in the memory bank, reconstructs the corresponding 3D assets from video, and integrates them into the target scene through orientation alignment and relighting. The resulting replacements provide complete multi-view representations of vehicles in the scene, enabling photorealistic synthesis of substantially altered configurations, as demonstrated in our experiments. Project page: https://yandex-research.github.io/madrive/

Problem

Research questions and friction points this paper is trying to address.

Enhances realism in altered driving scenes

Replaces observed vehicles with similar 3D assets

Supports photorealistic synthesis of novel scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-augmented framework for driving scenes

Retrieves similar 3D assets from external memory

Integrates assets via orientation and relighting

🔎 Similar Papers

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations