🤖 AI Summary
This work addresses the limited generalization of existing semantic correspondence methods to unseen query points beyond annotated keypoints in real-world scenarios. To overcome this, the authors propose a unified framework built upon DINOv2 that transforms sparse keypoint supervision into globally consistent dense correspondences through a coarse-to-fine spatial refinement strategy, a self-distillation mechanism, and a sparse-to-dense correspondence expansion module. The approach significantly improves fine-grained localization accuracy, achieving new state-of-the-art results on SPair-71k, AP-10K, and PF-PASCAL, with a 8.9-point gain in PCK@0.01, and improvements of 5.1 and 4.7 points in generalization to unseen keypoints and novel categories, respectively. Moreover, the model is three times smaller and ten times faster at inference compared to prior methods.
📝 Abstract
Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify at fine-grained localization thresholds (+8.9 PCK@0.01), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at https://github.com/visinf/MARCO .