🤖 AI Summary
Existing deformable 3D models learned from single-view in-the-wild images struggle to maintain semantic consistency across instances, limiting their performance in semantic correspondence tasks. This work proposes a category-level semantically consistent deformable 3D representation that treats reconstruction as a means rather than an end goal. By establishing stable vertex correspondences through a canonical template mesh guided by image-driven deformation fields, the method explicitly couples geometric deformation with semantic alignment. The key innovation lies in introducing a feature consistency loss and a vertex-index-conditioned deformation mechanism. Evaluated on SPair-71k, the approach achieves a significant improvement of +14.7 PCK@0.1, demonstrating the potential of deformable models as effective semantic 3D representations.
📝 Abstract
Learning deformable 3D object models from single-view in-the-wild images has enabled impressive 3D shape reconstruction without supervision. However, it remains unclear whether these models capture the semantic structure required for downstream tasks. We find that existing deformable reconstruction approaches, despite producing visually plausible geometry, yield unstable correspondences across instances and perform poorly on semantic correspondence benchmarks. We introduce SEMAGIC, a framework for learning semantically consistent deformable 3D representations from single-view in-the-wild images. Rather than treating reconstruction as the end goal, SEMAGIC uses deformable modeling as a mechanism to discover category-level correspondences. Each category is represented by a canonical template mesh and a learned deformation field, functioning similarly to an autoencoder that reconstructs instance geometry from image features, enabling vertices to maintain consistent semantic meaning across instances. Semantic consistency is enforced during training through (i) a feature-level consistency loss aligning semantic features between canonical and deformed meshes, and (ii) vertex-index-conditioned deformation that preserves semantic correspondence across instances. By explicitly coupling geometric deformation with semantic alignment, SEMAGIC produces representations that maintain stable part correspondences across intra-category variation. Experiments demonstrate that SEMAGIC improves semantic correspondence of deformable models by +14.7 PCK@0.1 on SPair-71k, establishing deformable models as effective semantic 3D representations.