🤖 AI Summary
This work addresses the challenges of multi-animal 3D reconstruction in complex wild scenes—such as high inter-species variation, severe occlusions, and dense co-occurrence—by proposing the first single-image framework that supports user-provided prompts. Built upon the SMAL+ parametric model, the method integrates keypoint and mask-based prompting mechanisms to accurately disentangle and reconstruct multiple animal instances from a single image. The study introduces Herd3D, the first diverse multi-animal 3D dataset comprising over 5,000 images, and demonstrates state-of-the-art performance across multiple benchmarks, including Animal3D, APTv2, and Animal Kingdom, significantly outperforming both model-based and model-free existing approaches.
📝 Abstract
3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first promptable framework for multi-animal 3D reconstruction from a single image. Built on the SMAL+ parametric animal model, our method jointly reconstructs multiple instances and supports flexible prompts in the form of keypoints and masks which enable more reliable disambiguation in crowded and occluded scenes. To train such a model, we further introduce Herd3D, a multi-animal 3D dataset containing over 5K images, designed to increase diversity in species, interactions, and occlusion patterns. Experiments on the Animal3D, APTv2, and Animal Kingdom datasets show that our framework achieves state-of-the-art results over both existing model-based and model-free methods, demonstrating a scalable and effective solution for prompt-driven animal 3D reconstruction in the wild.