🤖 AI Summary
Non-rigid 3D pose reconstruction from single-view depth images traditionally relies on large-scale annotated datasets to cover complex deformation spaces, resulting in degraded performance under few-shot settings.
Method: This paper proposes a canonical-pose alignment framework that decouples non-rigid reconstruction into two stages: (i) learning a mapping from depth images to a canonical voxel representation, and (ii) jointly regressing the original-pose voxel via rigid transformation estimation. A cross-view consistency constraint is further introduced to enhance geometric robustness.
Contribution/Results: The method achieves high-fidelity reconstruction with only ~300 training samples—orders of magnitude fewer than prior approaches—while significantly outperforming state-of-the-art methods on both human and animal benchmarks. It demonstrates strong few-shot adaptability and robustness to pose variation, establishing a new trade-off between data efficiency and geometric accuracy in non-rigid 3D reconstruction.
📝 Abstract
3D reconstruction from 2D inputs, especially for non-rigid objects like humans, presents unique challenges due to the significant range of possible deformations. Traditional methods often struggle with non-rigid shapes, which require extensive training data to cover the entire deformation space. This study addresses these limitations by proposing a canonical pose reconstruction model that transforms single-view depth images of deformable shapes into a canonical form. This alignment facilitates shape reconstruction by enabling the application of rigid object reconstruction techniques, and supports recovering the input pose in voxel representation as part of the reconstruction task, utilizing both the original and deformed depth images. Notably, our model achieves effective results with only a small dataset of approximately 300 samples. Experimental results on animal and human datasets demonstrate that our model outperforms other state-of-the-art methods.