Neural Atlas Graphs for Dynamic Scene Decomposition and Editing

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing methods for editable representations of high-resolution dynamic scenes struggle to balance editability with accurate modeling of complex occlusions: neural maps support 2D editing but suffer from multi-object occlusion ambiguities; scene graph models capture 3D spatial relationships yet lack view-consistent appearance editing. To address this, we propose Neural Atlas Graphs (NAGs), where each node is a view-dependent neural atlas—integrating the 2D editability of neural maps with the 3D relational reasoning of graph structures—enabling high-fidelity, view-consistent editing without explicit annotations. NAGs jointly optimize deformable image layers and neural radiance fields during inference for reconstruction and editing. On the Waymo Open Dataset, our method achieves a 5 dB PSNR gain over state-of-the-art; on DAVIS video editing benchmarks, it improves PSNR by over 7 dB. NAGs support high-resolution environmental modifications and synthesis of photorealistic virtual driving scenes.

Technology Category

Application Category

📝 Abstract

Learning editable high-resolution scene representations for dynamic scenes is an open problem with applications across the domains from autonomous driving to creative editing - the most successful approaches today make a trade-off between editability and supporting scene complexity: neural atlases represent dynamic scenes as two deforming image layers, foreground and background, which are editable in 2D, but break down when multiple objects occlude and interact. In contrast, scene graph models make use of annotated data such as masks and bounding boxes from autonomous-driving datasets to capture complex 3D spatial relationships, but their implicit volumetric node representations are challenging to edit view-consistently. We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view-dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. Fit at test-time, NAGs achieve state-of-the-art quantitative results on the Waymo Open Dataset - by 5 dB PSNR increase compared to existing methods - and make environmental editing possible in high resolution and visual quality - creating counterfactual driving scenarios with new backgrounds and edited vehicle appearance. We find that the method also generalizes beyond driving scenes and compares favorably - by more than 7 dB in PSNR - to recent matting and video editing baselines on the DAVIS video dataset with a diverse set of human and animal-centric scenes.

Problem

Research questions and friction points this paper is trying to address.

Creating editable high-resolution representations for complex dynamic scenes

Balancing editability with handling multiple object occlusions and interactions

Enabling view-consistent 3D positioning while maintaining 2D appearance editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid scene representation combining neural atlas

Graph nodes with view-dependent neural atlas

Enables 2D editing and 3D positioning

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Internship 3D Scene Editing and Generation

Bosch Group

Hildesheim, NDS, DE

Authors to Follow