TRACE: High-Fidelity 3D Scene Editing via Tangible Reconstruction and Geometry-Aligned Contextual Video Masking

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of achieving fine-grained, part-level, high-fidelity editing in 3D scenes while preserving structural integrity. The authors propose a mesh-guided 3D Gaussian Splatting (3DGS) editing framework that aligns video diffusion models with explicit 3D geometry to enable automated, high-fidelity manipulation. Key contributions include the introduction of MV-TRACE—the first dataset supporting multi-view consistency—alongside novel mechanisms: Touchable Geometry Anchoring (TGA) and Contextual Video Masking (CVM). The method employs a three-stage pipeline integrating 3D reconstruction, two-stage registration, and autoregressive video generation. Experiments demonstrate that the approach significantly outperforms existing methods in both editing flexibility and structural coherence, producing temporally consistent and physically plausible 3D scene edits with high fidelity.

Technology Category

Application Category

📝 Abstract

We present TRACE, a mesh-guided 3DGS editing framework that achieves automated, high-fidelity scene transformation. By anchoring video diffusion with explicit 3D geometry, TRACE uniquely enables fine-grained, part-level manipulatio--such as local pose shifting or component replacemen--while preserving the structural integrity of the central subject, a capability largely absent in existing editing methods. Our approach comprises three key stages: (1) Multi-view 3D-Anchor Synthesis, which leverages a sparse-view editor trained on our MV-TRACE datase--the first multi-view consistent dataset dedicated to scene-coherent object addition and modificatio--to generate spatially consistent 3D-anchors; (2) Tangible Geometry Anchoring (TGA), which ensures precise spatial synchronization between inserted meshes and the 3DGS scene via two-phase registration; and (3) Contextual Video Masking (CVM), which integrates 3D projections into an autoregressive video pipeline to achieve temporally stable, physically-grounded rendering. Extensive experiments demonstrate that TRACE consistently outperforms existing methods especially in editing versatility and structural integrity.

Problem

Research questions and friction points this paper is trying to address.

3D scene editing

structural integrity

part-level manipulation

high-fidelity editing

geometry consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

Geometry-Aligned Editing

Multi-view Consistency