SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 6-DoF pose estimation methods from a single RGB image suffer from strong initialization dependence, high rotational ambiguity, and—critically—reliance on depth sensors or multi-view inputs, leading to high deployment costs. Method: This paper proposes a pure-RGB, high-precision pose estimation framework that integrates differentiable rendering of 3D Gaussian Splatting (3DGS) with a dual-branch neural network. We introduce a novel geometric-domain attention mechanism to decouple position and pose alignment, and design a coarse-to-fine optimization pipeline jointly correcting 2D feature misalignment and sparse ray-depth errors. Contribution/Results: Our method achieves state-of-the-art performance on three standard benchmarks under the single-RGB setting, matching the accuracy of advanced depth- or multi-view–based approaches while significantly reducing hardware requirements and deployment complexity.

Technology Category

Application Category

📝 Abstract
6-DoF pose estimation is a fundamental task in computer vision with wide-ranging applications in augmented reality and robotics. Existing single RGB-based methods often compromise accuracy due to their reliance on initial pose estimates and susceptibility to rotational ambiguity, while approaches requiring depth sensors or multi-view setups incur significant deployment costs. To address these limitations, we introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation using only a single RGB image. Central to our approach is the Dual-Attention Ray Scoring Network (DARS-Net), which innovatively decouples positional and angular alignment through geometry-domain attention mechanisms, explicitly modeling directional dependencies to mitigate rotational ambiguity. Additionally, a coarse-to-fine optimization pipeline progressively refines pose estimates by aligning dense 2D features between query images and 3DGS-synthesized views, effectively correcting feature misalignment and depth errors from sparse ray sampling. Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings, rivaling approaches that depend on depth or multi-view images.
Problem

Research questions and friction points this paper is trying to address.

Improves 6-DoF pose estimation from single RGB images
Reduces reliance on depth sensors or multi-view setups
Mitigates rotational ambiguity and initial pose dependency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D Gaussian Splatting for pose estimation
Implements Dual-Attention Ray Scoring Network
Employs coarse-to-fine optimization pipeline
🔎 Similar Papers
No similar papers found.
L
Linqi Yang
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150006, China; Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China
Xiongwei Zhao
Xiongwei Zhao
Ph.D Candidate, Harbin Institute of Technology
3D PerceptionWorld ModelLLMEmbodied AIAutonomous System
Q
Qihao Sun
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150006, China; Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China
K
Ke Wang
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150006, China
A
Ao Chen
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150006, China
Peng Kang
Peng Kang
Northwestern University
visionneuromorphic engineeringmachine learningrecommendation systemsnumerical analysis