🤖 AI Summary
This work addresses the challenging problem of estimating relative 3D camera orientations from unconstrained Internet-sourced image pairs exhibiting extreme viewpoint differences and non-overlapping fields of view—a scenario where existing methods, reliant on controlled 3D environments or synthetic data, fail. To tackle this, we introduce ExtremeLandmarkPairs, the first benchmark dataset specifically designed for real-world extreme-viewpoint pose estimation. We propose a novel Transformer-based architecture tailored for wild images, enabling end-to-end rotation estimation. Our method integrates self-supervised keypoint matching with geometric consistency constraints, ensuring robust inference even under zero field-of-view overlap. Extensive experiments demonstrate that our approach significantly outperforms specialized rotation estimation algorithms and state-of-the-art 3D reconstruction methods on extreme-viewpoint pairs, while exhibiting strong generalization across diverse real-world scenes. This work establishes a new paradigm for camera orientation estimation in unstructured, large-baseline settings.
📝 Abstract
We present a technique and benchmark dataset for estimating the relative 3D orientation between a pair of Internet images captured in an extreme setting, where the images have limited or non-overlapping field of views. Prior work targeting extreme rotation estimation assume constrained 3D environments and emulate perspective images by cropping regions from panoramic views. However, real images captured in the wild are highly diverse, exhibiting variation in both appearance and camera intrinsics. In this work, we propose a Transformer-based method for estimating relative rotations in extreme real-world settings, and contribute the ExtremeLandmarkPairs dataset, assembled from scene-level Internet photo collections. Our evaluation demonstrates that our approach succeeds in estimating the relative rotations in a wide variety of extreme-view Internet image pairs, outperforming various baselines, including dedicated rotation estimation techniques and contemporary 3D reconstruction methods.