🤖 AI Summary
Existing 3D editing methods rely on per-scene optimization, which is computationally expensive and often leads to multi-view inconsistencies. This work proposes a fully feedforward 3D editing framework built upon the TRELLIS backbone, enabling globally consistent edits from a single input view. The key innovations include Voxel FlowEdit in a sparse voxel latent space, which achieves globally consistent deformations in a single forward pass; a normal-guided single-view-to-multi-view appearance prior that recovers high-frequency texture details; and alignment between structured 3D representations and 2D edits. The proposed method significantly improves editing efficiency while preserving high-fidelity geometry and appearance, along with strong multi-view consistency.
📝 Abstract
Existing 3D editing methods rely on computationally intensive scene-by-scene iterative optimization and suffer from multi-view inconsistency. We propose an effective and fully feedforward 3D editing framework based on the TRELLIS generative backbone, capable of modifying 3D models from a single editing view. Our framework addresses two key issues: adapting training-free 2D editing to structured 3D representations, and overcoming the bottleneck of appearance fidelity in compressed 3D features. To ensure geometric consistency, we introduce Voxel FlowEdit, an edit-driven flow in the sparse voxel latent space that achieves globally consistent 3D deformation in a single pass. To restore high-fidelity details, we develop a normal-guided single to multi-view generation module as an external appearance prior, successfully recovering high-frequency textures. Experiments demonstrate that our method enables fast, globally consistent, and high-fidelity 3D model editing.