Cyc3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate alignment between input conditions (e.g., edges, depth) and generated 3D geometry in controllable 3D generation, this paper proposes a second-order cycle-consistency regularization framework. Our method introduces a dual-constraint mechanism: (i) view consistency ensures coherent 3D structural reconstruction across multiple viewpoints, and (ii) condition consistency enforces precise recovery of fine-grained geometric details from input conditions. The approach supports end-to-end, joint text-and-condition-image-driven generation via a feed-forward 3D backbone network, integrated with multi-view rendering, signal re-extraction, semantic similarity metrics, and PSNR-optimized losses. Extensive experiments on mainstream benchmarks demonstrate significant improvements in controllability: +14.17% PSNR under edge guidance and +6.26% under sketch guidance. Moreover, our method achieves superior fine-grained structural fidelity compared to state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Despite the remarkable progress of 3D generation, achieving controllability, i.e., ensuring consistency between generated 3D content and input conditions like edge and depth, remains a significant challenge. Existing methods often struggle to maintain accurate alignment, leading to noticeable discrepancies. To address this issue, we propose ame{}, a new framework that enhances controllable 3D generation by explicitly encouraging cyclic consistency between the second-order 3D content, generated based on extracted signals from the first-order generation, and its original input controls. Specifically, we employ an efficient feed-forward backbone that can generate a 3D object from an input condition and a text prompt. Given an initial viewpoint and a control signal, a novel view is rendered from the generated 3D content, from which the extracted condition is used to regenerate the 3D content. This re-generated output is then rendered back to the initial viewpoint, followed by another round of control signal extraction, forming a cyclic process with two consistency constraints. emph{View consistency} ensures coherence between the two generated 3D objects, measured by semantic similarity to accommodate generative diversity. emph{Condition consistency} aligns the final extracted signal with the original input control, preserving structural or geometric details throughout the process. Extensive experiments on popular benchmarks demonstrate that ame{} significantly improves controllability, especially for fine-grained details, outperforming existing methods across various conditions (e.g., +14.17% PSNR for edge, +6.26% PSNR for sketch).
Problem

Research questions and friction points this paper is trying to address.

Ensuring consistency between generated 3D content and input conditions
Improving controllability in fine-grained 3D generation details
Addressing discrepancies in alignment of 3D content with controls
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cycle consistency regularization for 3D generation
Feed-forward backbone with text prompts
Dual consistency constraints for controllability
🔎 Similar Papers
No similar papers found.