GoTrack: Generic 6DoF Object Pose Refinement and Tracking

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the problem of generic, object-agnostic 6DoF pose refinement and tracking for CAD models without object-specific fine-tuning. We propose GoTrack: an end-to-end, RGB-only method built upon dual-path optical flow registration. Its key contribution is the first joint modeling of CAD-to-frame and frame-to-frame optical flow registration, coupled with a lightweight Transformer that fuses DINOv2 features for confidence-aware pose regression—eliminating conventional scoring networks and object-specific training. GoTrack operates solely on RGB input and integrates RAFT-based optical flow estimation, CAD template matching, and single-modality optimization. On standard 6DoF pose estimation benchmarks, it achieves state-of-the-art RGB-only performance while significantly reducing computational overhead. The code and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract

We introduce GoTrack, an efficient and accurate CAD-based method for 6DoF object pose refinement and tracking, which can handle diverse objects without any object-specific training. Unlike existing tracking methods that rely solely on an analysis-by-synthesis approach for model-to-frame registration, GoTrack additionally integrates frame-to-frame registration, which saves compute and stabilizes tracking. Both types of registration are realized by optical flow estimation. The model-to-frame registration is noticeably simpler than in existing methods, relying only on standard neural network blocks (a transformer is trained on top of DINOv2) and producing reliable pose confidence scores without a scoring network. For the frame-to-frame registration, which is an easier problem as consecutive video frames are typically nearly identical, we employ a light off-the-shelf optical flow model. We demonstrate that GoTrack can be seamlessly combined with existing coarse pose estimation methods to create a minimal pipeline that reaches state-of-the-art RGB-only results on standard benchmarks for 6DoF object pose estimation and tracking. Our source code and trained models are publicly available at https://github.com/facebookresearch/gotrack

Problem

Research questions and friction points this paper is trying to address.

Refines and tracks 6DoF object poses without object-specific training

Integrates model-to-frame and frame-to-frame registration using optical flow

Achieves state-of-the-art RGB-only 6DoF pose estimation and tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines model-to-frame and frame-to-frame registration

Uses optical flow for both registration types

Simplifies model-to-frame with standard neural blocks

🔎 Similar Papers

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB