🤖 AI Summary
Monocular 3D animal reconstruction remains challenging due to complex poses, severe self-occlusions, intricate fur details, and the absence of 3D supervision with pose annotations or multi-view data—particularly from the back—often resulting in geometric distortions and texture inconsistencies. To address these issues, this work proposes a novel framework that refines a parametric mesh into a high-fidelity signed distance field (SDF) geometry through diffusion-enhanced multi-view normal field optimization. It further generates view-consistent, photorealistic textures by integrating structure- and style-guided conditional local inpainting. Using only approximately 7,000 unannotated dog images without any 3D labels, the method outperforms current state-of-the-art approaches in both geometric accuracy and texture realism, achieving complete and lifelike 3D reconstructions of dogs.
📝 Abstract
Monocular 3D animal reconstruction is challenging due to complex articulation, self-occlusion, and fine-scale details such as fur. Existing methods often produce distorted geometry and inconsistent textures due to the lack of articulated 3D supervision and limited availability of back-view images in 2D datasets, which makes reconstructing unobserved regions particularly difficult. To address these limitations, we propose DogWeave, a model-based framework for reconstructing high-fidelity 3D canine models from a single RGB image. DogWeave improves geometry by refining a coarsely-initiated parametric mesh into a detailed SDF representation through multi-view normal field optimization using diffusion-enhanced normals. It then generates view-consistent textures through conditional partial inpainting guided by structure and style cues, enabling realistic reconstruction of unobserved regions. Using only about 7,000 dog images processed via our 2D pipeline for training, DogWeave produces complete, realistic 3D models and outperforms state-of-the-art single image to 3d reconstruction methods in both shape accuracy and texture realism for canines.