Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Category-level 6D pose and size estimation requires strong generalization to unseen object instances, yet existing methods suffer from degraded performance under complex geometries or non-canonical deformations. To address this, we propose an instance-adaptive keypoint modeling framework. First, we design an instance-adaptive keypoint generator that dynamically adjusts to object geometry. Second, we introduce a local-global collaborative aggregation mechanism, comprising bidirectional Mamba-driven global keypoint aggregation and a feature-sequence flipping strategy to enhance structural consistency. Third, we incorporate a surface loss and a separation loss to enforce spatial uniformity and diversity of keypoints. Our method achieves state-of-the-art performance on CAMERA25, REAL275, and HouseCat6D benchmarks, significantly outperforming prior approaches in both accuracy and robustness across diverse object categories and unseen instances.

Technology Category

Application Category

📝 Abstract
Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address this challenge, we propose INKL-Pose, a novel category-level object pose estimation framework that enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Specifically, our approach first predicts semantically consistent and geometric informative keypoints through an Instance-Adaptive Keypoint Generator, then refines them with: (1) a Local Keypoint Feature Aggregator capturing fine-grained geometries, and (2) a Global Keypoint Feature Aggregator using bidirectional Mamba for structural consistency. To enable bidirectional modeling in Mamba, we introduce a Feature Sequence Flipping strategy that preserves spatial coherence while constructing backward feature sequences. Additionally, we design a surface loss and a separation loss to enforce uniform coverage and spatial diversity in keypoint distribution. The generated keypoints are finally mapped to a canonical space for regressing the object's 6D pose and size. Extensive experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance and significantly outperforms existing methods.
Problem

Research questions and friction points this paper is trying to address.

Estimating 6D pose and size of unseen object instances
Handling complex geometries and shape deviations
Improving generalization across diverse object categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instance-adaptive keypoint learning for pose estimation
Local-to-global geometric aggregation with Mamba
Feature sequence flipping for bidirectional modeling
🔎 Similar Papers
No similar papers found.
X
Xiao Zhang
Wuhan Institute of Technology, Wuhan 430205, China
Lu Zou
Lu Zou
Lecturer at Wuhan Institute of Technology
computer vision
T
Tao Lu
Wuhan Institute of Technology, Wuhan 430205, China
Y
Yuan Yao
Wuhan Institute of Technology, Wuhan 430205, China
Z
Zhangjin Huang
University of Science and Technology of China, Hefei 230031, China
G
Guoping Wang
Peking University, Beijing 100871, China