4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of unsupervised 4D reconstruction of animatable 3D animal models from monocular video, eliminating reliance on manually annotated sparse semantic keypoints. We propose an end-to-end framework grounded in the SMAL parametric prior, which integrates dense features extracted from pretrained 2D vision models to establish multi-level geometric and appearance alignment—spanning silhouette, part, pixel, and temporal dimensions. By jointly optimizing a dense feature mapping network and a deformation field, our method achieves high-fidelity, temporally coherent 4D animal modeling. Extensive evaluation across diverse animal categories demonstrates significant improvements over both model-based and model-free state-of-the-art approaches. The reconstructed 3D assets exhibit topological consistency, natural articulation under pose control, strong generalization to unseen poses and species, and practical deployability for downstream animation tasks.

Technology Category

Application Category

📝 Abstract
Existing methods for reconstructing animatable 3D animals from videos typically rely on sparse semantic keypoints to fit parametric models. However, obtaining such keypoints is labor-intensive, and keypoint detectors trained on limited animal data are often unreliable. To address this, we propose 4D-Animal, a novel framework that reconstructs animatable 3D animals from videos without requiring sparse keypoint annotations. Our approach introduces a dense feature network that maps 2D representations to SMAL parameters, enhancing both the efficiency and stability of the fitting process. Furthermore, we develop a hierarchical alignment strategy that integrates silhouette, part-level, pixel-level, and temporal cues from pre-trained 2D visual models to produce accurate and temporally coherent reconstructions across frames. Extensive experiments demonstrate that 4D-Animal outperforms both model-based and model-free baselines. Moreover, the high-quality 3D assets generated by our method can benefit other 3D tasks, underscoring its potential for large-scale applications. The code is released at https://github.com/zhongshsh/4D-Animal.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs animatable 3D animals without keypoint annotations
Improves efficiency and stability in fitting 3D animal models
Ensures accurate and temporally coherent 3D reconstructions from videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense feature network maps 2D to SMAL parameters
Hierarchical alignment integrates multi-level visual cues
No sparse keypoint annotations required for reconstruction
🔎 Similar Papers
No similar papers found.