Best Segmentation Buddies for Image-Shape Correspondence

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the challenge of establishing cross-modal semantic correspondences between natural images and textureless 3D shapes, overcoming discrepancies in appearance, geometry, and viewpoint. The proposed method distills deep features from a 2D vision model onto the surface of 3D shapes and computes cross-modal feature similarities between image pixels and shape vertices. It introduces a “best segmentation partner” mechanism that identifies 3D vertices whose most similar image pixels fall within coherent image segmentation regions, thereby enabling semantically consistent image-to-shape alignment. Leveraging this correspondence, shape segmentation is performed directly in 3D space through an end-to-end bootstrapped alignment framework. Notably, the approach requires neither surface textures nor manual annotations, and demonstrates strong generality, robustness, and semantic accuracy across diverse image–shape pairs.

📝 Abstract

Finding correspondences is a fundamental and extensively researched problem in computer vision and graphics. In this work, we examine the underexplored task of estimating segmentation-to-segmentation correspondence between images in the wild and untextured 3D shapes. This task is highly challenging due to substantial differences in appearance, geometry, and viewpoint. Our approach bridges the cross-modality gap by linking pixels in the image segment to vertices in the corresponding semantic part of the 3D shape. To achieve this, we first distill deep visual features from a 2D vision model onto the 3D shape surface, allowing for the computation of feature similarity between image pixels and shape vertices. Then, we identify Best Segmentation Buddies, vertices whose most similar image pixel lies within the image segmentation region, enabling the reliable discovery of vertices in semantically corresponding shape parts. Finally, we leverage distilled 3D features from the 2D image segmentation model to segment the shape directly in 3D, bootstrapping the correspondence process. We demonstrate the generality and robustness of our approach across a wide range of image-shape pairs, showcasing accurate and semantically meaningful correspondences. Our project page is at https://threedle.github.io/bsb/.

Problem

Research questions and friction points this paper is trying to address.

image-shape correspondence

segmentation correspondence

cross-modality alignment

3D shape segmentation

semantic correspondence

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modality correspondence

feature distillation

3D shape segmentation