RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited robustness of 6D object pose estimation under occlusion and novel viewpoints, this paper proposes a retrieval-augmented multimodal pose estimation framework. Our method constructs a CAD knowledge base integrating multi-view rendered images and 3D point clouds, and introduces the ReSPC cross-modal matching module to achieve geometrically consistent alignment and pose refinement between query images and CAD models. It jointly leverages visual semantics and geometric priors through multimodal feature extraction, rendering-based 2D–3D alignment, retrieval-augmented decoding, and efficient CAD model retrieval. Evaluated on standard benchmarks—including LINEMOD and OCCLUSION—as well as real-world robotic grasping tasks, our approach achieves significant improvements in pose accuracy (average +8.2% ADD-S) and robustness under occlusion and viewpoint variation. This work establishes a generalizable paradigm for pose perception in robotic manipulation.

Technology Category

Application Category

📝 Abstract
Accurate 6D pose estimation is key for robotic manipulation, enabling precise object localization for tasks like grasping. We present RAG-6DPose, a retrieval-augmented approach that leverages 3D CAD models as a knowledge base by integrating both visual and geometric cues. Our RAG-6DPose roughly contains three stages: 1) Building a Multi-Modal CAD Knowledge Base by extracting 2D visual features from multi-view CAD rendered images and also attaching 3D points; 2) Retrieving relevant CAD features from the knowledge base based on the current query image via our ReSPC module; and 3) Incorporating retrieved CAD information to refine pose predictions via retrieval-augmented decoding. Experimental results on standard benchmarks and real-world robotic tasks demonstrate the effectiveness and robustness of our approach, particularly in handling occlusions and novel viewpoints. Supplementary material is available on our project website: https://sressers.github.io/RAG-6DPose .
Problem

Research questions and friction points this paper is trying to address.

Estimating accurate 6D object poses for robotic manipulation
Leveraging CAD models as a knowledge base for pose refinement
Handling occlusions and novel viewpoints in pose estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages 3D CAD models as knowledge base
Integrates visual and geometric cues
Refines pose via retrieval-augmented decoding
🔎 Similar Papers
No similar papers found.