Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenge of establishing object-level visual correspondences across disparate video viewpoints, such as first-person and third-person perspectives. It proposes a self-supervised framework based on conditional binary segmentation, wherein a query object mask is encoded into a latent representation to guide the localization of its corresponding instance in the target video. A cycle-consistent mask prediction mechanism is introduced to generate a strong self-supervisory signal without requiring ground-truth annotations. Furthermore, the approach integrates test-time training (TTT) to substantially enhance generalization to unseen viewpoints. Evaluated on the Ego-Exo4D and HANDAL-X benchmarks, the method achieves state-of-the-art performance, demonstrating its effectiveness and robustness in cross-view correspondence learning.

Technology Category

Application Category

📝 Abstract

We study the task of establishing object-level visual correspondence across different viewpoints in videos, focusing on the challenging egocentric-to-exocentric and exocentric-to-egocentric scenarios. We propose a simple yet effective framework based on conditional binary segmentation, where an object query mask is encoded into a latent representation to guide the localization of the corresponding object in a target video. To encourage robust, view-invariant representations, we introduce a cycle-consistency training objective: the predicted mask in the target view is projected back to the source view to reconstruct the original query mask. This bidirectional constraint provides a strong self-supervisory signal without requiring ground-truth annotations and enables test-time training (TTT) at inference. Experiments on the Ego-Exo4D and HANDAL-X benchmarks demonstrate the effectiveness of our optimization objective and TTT strategy, achieving state-of-the-art performance. The code is available at https://github.com/shannany0606/CCMP.

Problem

Research questions and friction points this paper is trying to address.

visual correspondence

cross-view

object correspondence

egocentric

exocentric

Innovation

Methods, ideas, or system contributions that make the work stand out.

cycle-consistent

mask prediction

cross-view correspondence

test-time training

view-invariant representation

🔎 Similar Papers

No similar papers found.