T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph

📅 2025-05-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sparse-view 6-DoF camera pose estimation for remote sensing imagery is limited by insufficient exploitation of translational cues. To address this, we propose a sparse-view pose estimation framework featuring: (i) a novel pairwise translation graph that explicitly encodes inter-view translational constraints; (ii) rotation-decoupled relative-t and pair-t translational representations to enhance cross-scene robustness; and (iii) dual-coordinate translational encoding, fully connected graph-structured modeling, and a plug-and-play branch integration architecture. Evaluated on C03D and IMC PhotoTourism under 2–8-view settings, our method achieves 1–6% improvement in camera center localization accuracy over state-of-the-art approaches including RelPose++ and Forge. Moreover, it demonstrates strong compatibility with existing pipelines.

Technology Category

Application Category

📝 Abstract
Sparse-view camera pose estimation, which aims to estimate the 6-Degree-of-Freedom (6-DoF) poses from a limited number of images captured from different viewpoints, is a fundamental yet challenging problem in remote sensing applications. Existing methods often overlook the translation information between each pair of viewpoints, leading to suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into existing models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module's robustness. Extensive experiments on two state-of-the-art methods (RelPose++ and Forge) using public datasets (C03D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves by 1% to 6% from 2 to 8 viewpoints.
Problem

Research questions and friction points this paper is trying to address.

Enhancing sparse-view camera pose estimation accuracy
Utilizing pairwise translation information for better performance
Improving robustness with rotation-disentangled translation representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight plug-and-play module for pose estimation
MLP-based translation graph with pairwise relationships
Dual translation representations for enhanced adaptability
🔎 Similar Papers
No similar papers found.
Q
Qingyu Xian
Pervasive Systems Research Group, Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede, The Netherlands
W
Weiqin Jiao
Department of Earth Observation Science, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands
H
Hao Cheng
Department of Earth Observation Science, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands
B
Berend J Zwaag
Yanqiu Huang
Yanqiu Huang
University of Twente
Wireless sensor networksInternet of ThingsCyber physical system