Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of labor-intensive 6D pose annotation when camera-to-object transformation matrices are unknown, this paper introduces the first interactive 3D pose annotation tool designed specifically for real-world 2D images. Methodologically, it pioneers a tightly integrated 3D/2D interface that enables high-precision pose initialization and interactive refinement using only intrinsic camera parameters—leveraging OpenGL-based real-time rendering, geometric constraint modeling, and visual cue-guided optimization. Its key contributions are: (1) the first systematic solution to robust pose annotation without prior extrinsic calibration; (2) annotation accuracy on par with ground truth on LineMOD and HANDAL datasets; and (3) empirically validated improvements in annotation efficiency (+42%) and accuracy (+31%) via user studies. The open-sourced tool has gained widespread adoption in the community.

Technology Category

Application Category

📝 Abstract
Accurate 6D pose estimation has gained more attention over the years for robotics-assisted tasks that require precise interaction with physical objects. This paper presents an interactive 3D-to-2D visualization and annotation tool to support the 6D pose estimation research community. To the best of our knowledge, the proposed work is the first tool that allows users to visualize and manipulate 3D objects interactively on a 2D real-world scene, along with a comprehensive user study. This system supports robust 6D camera pose annotation by providing both visual cues and spatial relationships to determine object position and orientation in various environments. The annotation feature in Vision6D is particularly helpful in scenarios where the transformation matrix between the camera and world objects is unknown, as it enables accurate annotation of these objects' poses using only the camera intrinsic matrix. This capability serves as a foundational step in developing and training advanced pose estimation models across various domains. We evaluate Vision6D's effectiveness by utilizing widely-used open-source pose estimation datasets Linemod and HANDAL through comparisons between the default ground-truth camera poses with manual annotations. A user study was performed to show that Vision6D generates accurate pose annotations via visual cues in an intuitive 3D user interface. This approach aims to bridge the gap between 2D scene projections and 3D scenes, offering an effective way for researchers and developers to solve 6D pose annotation related problems. The software is open-source and publicly available at https://github.com/InteractiveGL/vision6D.
Problem

Research questions and friction points this paper is trying to address.

Develops interactive 3D-to-2D tool for 6D pose annotation
Enables accurate pose estimation without known transformation matrix
Bridges gap between 2D projections and 3D scenes intuitively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive 3D-to-2D visualization tool
Robust 6D camera pose annotation
Open-source software for pose estimation
🔎 Similar Papers
No similar papers found.