🤖 AI Summary
Existing 2D-lifted 3D editing methods suffer from inconsistent multi-view outputs due to the lack of a view-consistent 2D editing model. This paper introduces C3Editor, the first framework to establish a view-consistent 2D editing model for controllable, cross-view consistent, text-driven and interactive 3D content editing. Its core contributions are: (1) a decoupled dual-LoRA architecture, separately optimizing photorealistic view reconstruction and multi-view consistency; (2) synergistic integration of fine-tuned 2D diffusion models, explicit multi-view consistency constraints, and a 2D-to-3D feature lifting strategy; and (3) support for user-guided manual editing. Extensive experiments demonstrate that C3Editor significantly outperforms state-of-the-art methods in both qualitative and quantitative evaluations, achieving substantial improvements in visual consistency across views and geometric fidelity.
📝 Abstract
Existing 2D-lifting-based 3D editing methods often encounter challenges related to inconsistency, stemming from the lack of view-consistent 2D editing models and the difficulty of ensuring consistent editing across multiple views. To address these issues, we propose C3Editor, a controllable and consistent 2D-lifting-based 3D editing framework. Given an original 3D representation and a text-based editing prompt, our method selectively establishes a view-consistent 2D editing model to achieve superior 3D editing results. The process begins with the controlled selection of a ground truth (GT) view and its corresponding edited image as the optimization target, allowing for user-defined manual edits. Next, we fine-tune the 2D editing model within the GT view and across multiple views to align with the GT-edited image while ensuring multi-view consistency. To meet the distinct requirements of GT view fitting and multi-view consistency, we introduce separate LoRA modules for targeted fine-tuning. Our approach delivers more consistent and controllable 2D and 3D editing results than existing 2D-lifting-based methods, outperforming them in both qualitative and quantitative evaluations.