🤖 AI Summary
Existing correspondence matching methods—spanning stereo matching, optical flow, and feature matching—are typically task-specific, relying on dedicated architectures or fine-tuning. This hinders generalization across tasks and domains.
Method: We propose PanMatch, the first universal foundation model for multi-frame correspondence matching. It introduces a unified 2D displacement estimation framework, leverages large vision models (LVMs) as generic feature extractors, and incorporates a lightweight feature transformation pipeline to enable zero-shot cross-task and cross-domain transfer. PanMatch is pre-trained on a large-scale, cross-domain dataset comprising nearly 1.8 million samples.
Contribution/Results: PanMatch achieves shared-weight inference across all three tasks. Experiments show it outperforms UniMatch and Flow-Anything in cross-task evaluation, matches state-of-the-art task-specific methods, and demonstrates significantly improved zero-shot robustness under challenging conditions—including rainy scenes and satellite imagery.
📝 Abstract
This work presents PanMatch, a versatile foundation model for robust correspondence matching. Unlike previous methods that rely on task-specific architectures and domain-specific fine-tuning to support tasks like stereo matching, optical flow or feature matching, our key insight is that any two-frame correspondence matching task can be addressed within a 2D displacement estimation framework using the same model weights. Such a formulation eliminates the need for designing specialized unified architectures or task-specific ensemble models. Instead, it achieves multi-task integration by endowing displacement estimation algorithms with unprecedented generalization capabilities. To this end, we highlight the importance of a robust feature extractor applicable across multiple domains and tasks, and propose the feature transformation pipeline that leverage all-purpose features from Large Vision Models to endow matching baselines with zero-shot cross-view matching capabilities. Furthermore, we assemble a cross-domain dataset with near 1.8 million samples from stereo matching, optical flow, and feature matching domains to pretrain PanMatch. We demonstrate the versatility of PanMatch across a wide range of domains and downstream tasks using the same model weights. Our model outperforms UniMatch and Flow-Anything on cross-task evaluations, and achieves comparable performance to most state-of-the-art task-specific algorithms on task-oriented benchmarks. Additionally, PanMatch presents unprecedented zero-shot performance in abnormal scenarios, such as rainy day and satellite imagery, where most existing robust algorithms fail to yield meaningful results.