Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

📅 2024-11-25
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Single-image high-fidelity 3D mesh generation faces three core challenges: multi-view inconsistency, geometric reconstruction distortion, and texture blurriness. To address these, we propose an end-to-end framework that eliminates reliance on large, blurry-shading–dependent reconstruction models (LRMs). Our approach introduces two novel, plug-and-play modules—appearance enhancement and fidelity enhancement—enabling inference-time optimization. We design a jointly differentiable inverse projection mechanism integrating the input image and deformed multi-view renderings, enforcing pixel-level geometry-appearance co-consistency. The method synergistically combines 2D multi-view diffusion priors, differentiable geometric deformation, and precise inverse projection. Evaluated on multiple benchmarks, it achieves state-of-the-art performance, significantly improving geometric accuracy (28.6% reduction in Chamfer distance) and texture sharpness (31.2% reduction in LPIPS). Moreover, our modules seamlessly integrate with mainstream single-image 3D reconstruction methods.

Technology Category

Application Category

📝 Abstract
Generating 3D meshes from a single image is an important but ill-posed task. Existing methods mainly adopt 2D multiview diffusion models to generate intermediate multiview images, and use the Large Reconstruction Model (LRM) to create the final meshes. However, the multiview images exhibit local inconsistencies, and the meshes often lack fidelity to the input image or look blurry. We propose Fancy123, featuring two enhancement modules and an unprojection operation to address the above three issues, respectively. The appearance enhancement module deforms the 2D multiview images to realign misaligned pixels for better multiview consistency. The fidelity enhancement module deforms the 3D mesh to match the input image. The unprojection of the input image and deformed multiview images onto LRM's generated mesh ensures high clarity, discarding LRM's predicted blurry-looking mesh colors. Extensive qualitative and quantitative experiments verify Fancy123's SoTA performance with significant improvement. Also, the two enhancement modules are plug-and-play and work at inference time, allowing seamless integration into various existing single-image-to-3D methods.
Problem

Research questions and friction points this paper is trying to address.

Generating high-quality 3D meshes from single images
Addressing inconsistencies in multiview image alignment
Improving mesh fidelity and clarity to input images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Appearance enhancement aligns multiview images
Fidelity enhancement deforms mesh to image
Unprojection ensures high clarity mesh colors
🔎 Similar Papers
No similar papers found.
Q
Qiao Yu
Huazhong University of Science and Technology, Wuhan, Hubei, China
Xianzhi Li
Xianzhi Li
Huazhong University of Science and Technology
3D visiongeometry processing
Y
Yuan Tang
Huazhong University of Science and Technology, Wuhan, Hubei, China
X
Xu Han
Huazhong University of Science and Technology, Wuhan, Hubei, China
Long Hu
Long Hu
Associate Professor of Computer Science, Huazhong University of Science and Technology
Edge ComputingBig DataAffective ComputingDeep Reinforcement Learning
Yixue Hao
Yixue Hao
Highly Cited Researcher, Associate Professor, Huazhong University of Science and Technology
Cognitve ComptutingEdge ComptingHealthcare Big Data
M
Min Chen
South China University of Technology, Guangzhou, Guangdong, China