PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Camera pose estimation for image pairs with sparse or zero overlap remains a longstanding challenge in 3D vision. This paper proposes a hybrid video generation framework specifically optimized for pose estimation: it first synthesizes high-fidelity intermediate frames by jointly leveraging video interpolation and pose-conditioned novel view synthesis; then introduces a Feature Matching Selector (FMS) that adaptively identifies the most informative intermediate frames for robust pose estimation. Crucially, the method does not rely on initial overlap between input image pairs, thereby significantly mitigating matching failure caused by large viewpoint disparities. Extensive evaluations on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate consistent state-of-the-art performance—particularly under extreme low- or zero-overlap conditions—while maintaining computational efficiency and accuracy.

Technology Category

Application Category

📝 Abstract
Pairwise camera pose estimation from sparsely overlapping image pairs remains a critical and unsolved challenge in 3D vision. Most existing methods struggle with image pairs that have small or no overlap. Recent approaches attempt to address this by synthesizing intermediate frames using video interpolation and selecting key frames via a self-consistency score. However, the generated frames are often blurry due to small overlap inputs, and the selection strategies are slow and not explicitly aligned with pose estimation. To solve these cases, we propose Hybrid Video Generation (HVG) to synthesize clearer intermediate frames by coupling a video interpolation model with a pose-conditioned novel view synthesis model, where we also propose a Feature Matching Selector (FMS) based on feature correspondence to select intermediate frames appropriate for pose estimation from the synthesized results. Extensive experiments on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate that, compared to existing SOTA methods, PoseCrafter can obviously enhance the pose estimation performances, especially on examples with small or no overlap.
Problem

Research questions and friction points this paper is trying to address.

Estimating camera pose from sparse overlapping image pairs
Addressing blurry frame synthesis in small overlap scenarios
Improving frame selection efficiency for pose estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Video Generation synthesizes clearer intermediate frames
Feature Matching Selector picks frames via feature correspondence
Coupling video interpolation with pose-conditioned view synthesis
🔎 Similar Papers
No similar papers found.
Q
Qing Mao
School of Computer Science, Northwestern Polytechnical University
Tianxin Huang
Tianxin Huang
The University of Hong Kong
Computer VisionComputer Graphics
Y
Yu Zhu
School of Computer Science, Northwestern Polytechnical University
J
Jinqiu Sun
School of Astronautics, Northwestern Polytechnical University
Yanning Zhang
Yanning Zhang
Northwestern Polytechnical University
Computer Vision
Gim Hee Lee
Gim Hee Lee
Associate Professor of Computer Science, National University of Singapore
Computer VisionRoboticsMachine Learning