PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Camera pose estimation for image pairs with sparse or zero overlap remains a longstanding challenge in 3D vision. This paper proposes a hybrid video generation framework specifically optimized for pose estimation: it first synthesizes high-fidelity intermediate frames by jointly leveraging video interpolation and pose-conditioned novel view synthesis; then introduces a Feature Matching Selector (FMS) that adaptively identifies the most informative intermediate frames for robust pose estimation. Crucially, the method does not rely on initial overlap between input image pairs, thereby significantly mitigating matching failure caused by large viewpoint disparities. Extensive evaluations on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate consistent state-of-the-art performance—particularly under extreme low- or zero-overlap conditions—while maintaining computational efficiency and accuracy.

Technology Category

Application Category

📝 Abstract

Pairwise camera pose estimation from sparsely overlapping image pairs remains a critical and unsolved challenge in 3D vision. Most existing methods struggle with image pairs that have small or no overlap. Recent approaches attempt to address this by synthesizing intermediate frames using video interpolation and selecting key frames via a self-consistency score. However, the generated frames are often blurry due to small overlap inputs, and the selection strategies are slow and not explicitly aligned with pose estimation. To solve these cases, we propose Hybrid Video Generation (HVG) to synthesize clearer intermediate frames by coupling a video interpolation model with a pose-conditioned novel view synthesis model, where we also propose a Feature Matching Selector (FMS) based on feature correspondence to select intermediate frames appropriate for pose estimation from the synthesized results. Extensive experiments on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate that, compared to existing SOTA methods, PoseCrafter can obviously enhance the pose estimation performances, especially on examples with small or no overlap.

Problem

Research questions and friction points this paper is trying to address.

Estimating camera pose from sparse overlapping image pairs

Addressing blurry frame synthesis in small overlap scenarios

Improving frame selection efficiency for pose estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Video Generation synthesizes clearer intermediate frames

Feature Matching Selector picks frames via feature correspondence

Coupling video interpolation with pose-conditioned view synthesis

🔎 Similar Papers

No similar papers found.