DiMeR: Disentangled Mesh Reconstruction Model

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address conflicting optimization objectives and blurred geometric details in joint geometry-texture reconstruction from sparse RGB views, this paper proposes a geometry-texture dual-stream decoupled framework: the geometry branch employs normal maps as geometric priors for standalone 3D structural modeling, while the texture branch leverages RGB images to synthesize appearance. The two branches collaborate end-to-end to generate textured meshes. We introduce a novel dual-decoupled input paradigm and enhance the ground-truth–based 3D supervision mesh extraction algorithm to enable precise geometric optimization. Evaluated on GSO and OmniObject3D, our method reduces Chamfer distance by over 30% compared to state-of-the-art methods, significantly improving reconstruction quality under sparse-view, single-image, and text-guided 3D settings.

Technology Category

Application Category

📝 Abstract

With the advent of large-scale 3D datasets, feed-forward 3D generative models, such as the Large Reconstruction Model (LRM), have gained significant attention and achieved remarkable success. However, we observe that RGB images often lead to conflicting training objectives and lack the necessary clarity for geometry reconstruction. In this paper, we revisit the inductive biases associated with mesh reconstruction and introduce DiMeR, a novel disentangled dual-stream feed-forward model for sparse-view mesh reconstruction. The key idea is to disentangle both the input and framework into geometry and texture parts, thereby reducing the training difficulty for each part according to the Principle of Occam's Razor. Given that normal maps are strictly consistent with geometry and accurately capture surface variations, we utilize normal maps as exclusive input for the geometry branch to reduce the complexity between the network's input and output. Moreover, we improve the mesh extraction algorithm to introduce 3D ground truth supervision. As for texture branch, we use RGB images as input to obtain the textured mesh. Overall, DiMeR demonstrates robust capabilities across various tasks, including sparse-view reconstruction, single-image-to-3D, and text-to-3D. Numerous experiments show that DiMeR significantly outperforms previous methods, achieving over 30% improvement in Chamfer Distance on the GSO and OmniObject3D dataset.

Problem

Research questions and friction points this paper is trying to address.

Disentangles geometry and texture for mesh reconstruction

Uses normal maps to simplify geometry input complexity

Improves mesh extraction with 3D ground truth supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled dual-stream model for mesh reconstruction

Normal maps as input for geometry branch

Improved mesh extraction with 3D supervision

🔎 Similar Papers

No similar papers found.