MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation

📅 2026-01-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of estimating 6D poses of unseen CAD-model objects from RGB images without fine-tuning. To this end, we propose a lightweight network that directly matches correspondences between a query image and a set of reference images through an innovative multi-view reference feature fusion strategy. Our approach drastically reduces the number of required reference images while maintaining high pose estimation accuracy, thereby significantly lowering both storage and computational overhead. Evaluated on the seven core datasets of the BOP Challenge, the method achieves performance comparable to state-of-the-art approaches using fewer reference views and a smaller model footprint, while substantially reducing memory consumption and inference time.

Technology Category

Application Category

📝 Abstract
We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images. It can be instantly applied to a novel object at test time without finetuning. We design our network to meet the demands of real-world applications, emphasizing reduced memory requirements and fast inference time. Unlike existing works that utilize many reference images and have large network parameters, we directly match points based on the multi-view information between the query and reference images with a lightweight network. Thanks to our reference image fusion strategy, we significantly decrease the number of reference images, thus decreasing the time needed to process these images and the memory required to store them. Furthermore, with our lightweight network, our method requires less inference time. Though with fewer reference images, experiments on seven core datasets in the BOP challenge show that our method achieves comparable results with other methods that require more reference images and larger network parameters.
Problem

Research questions and friction points this paper is trying to address.

novel object pose estimation
CAD-based
RGB images
reference images
lightweight network
Innovation

Methods, ideas, or system contributions that make the work stand out.

novel object pose estimation
lightweight network
reference image fusion
CAD-based
zero-shot inference
🔎 Similar Papers
No similar papers found.
Xinhang Liu
Xinhang Liu
HKUST
Computer Vision
J
Jiawei Shi
School of Electronics and Information, Northwestern Polytechnical University
Zheng Dang
Zheng Dang
CVLab, EPFL, Switzerland
Computer VisionMachine Learning
Y
Yuchao Dai
School of Electronics and Information, Northwestern Polytechnical University