🤖 AI Summary
This work addresses the challenge of multi-view inconsistency in user-guided, free-form 3D scene expansion, where generative models often produce novel views misaligned geometrically with the original reconstruction. To tackle this, the authors propose SceneExpander, the first method to introduce a dual distillation mechanism for this task: anchor distillation preserves the original scene structure, while insertion-view self-distillation adapts the latent geometry and appearance to align coherently with newly added views. The approach enables test-time adaptation of parameterized feed-forward 3D reconstruction models. Experiments on ETH and in-the-wild data demonstrate that SceneExpander significantly improves scene expansion quality and maintains strong multi-view consistency even under substantial view misalignment.
📝 Abstract
World building with 3D scene representations is increasingly important for content creation, simulation, and interactive experiences, yet real workflows are inherently iterative: creators must repeatedly extend an existing scene under user control. Motivated by this research gap, we study 3D scene expansion in a user-centric workflow: starting from a real scene captured by multi-view images, we extend its coverage by inserting an additional view synthesized by a generative model. Unlike simple object editing or style transfer in a fixed scene, the inserted view is often 3D-misaligned with the original reconstruction, introducing geometry shifts, hallucinated content, or view-dependent artifacts that break global multi-view consistency. To address the challenge, we propose SceneExpander, which applies test-time adaptation to a parametric feed-forward 3D reconstruction model with two complementary distillation signals: anchor distillation stabilizes the original scene by distilling geometric cues from the captured views, while inserted-view self-distillation preserves observation-supported predictions yet adapts latent geometry and appearance to accommodate the misaligned inserted view. Experiments on ETH scenes and online data demonstrate improved expansion behavior and reconstruction quality under misalignment.