Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Category-level 6D pose and size estimation requires strong generalization to unseen object instances, yet existing methods suffer from degraded performance under complex geometries or non-canonical deformations. To address this, we propose an instance-adaptive keypoint modeling framework. First, we design an instance-adaptive keypoint generator that dynamically adjusts to object geometry. Second, we introduce a local-global collaborative aggregation mechanism, comprising bidirectional Mamba-driven global keypoint aggregation and a feature-sequence flipping strategy to enhance structural consistency. Third, we incorporate a surface loss and a separation loss to enforce spatial uniformity and diversity of keypoints. Our method achieves state-of-the-art performance on CAMERA25, REAL275, and HouseCat6D benchmarks, significantly outperforming prior approaches in both accuracy and robustness across diverse object categories and unseen instances.

Technology Category

Application Category

📝 Abstract

Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address this challenge, we propose INKL-Pose, a novel category-level object pose estimation framework that enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Specifically, our approach first predicts semantically consistent and geometric informative keypoints through an Instance-Adaptive Keypoint Generator, then refines them with: (1) a Local Keypoint Feature Aggregator capturing fine-grained geometries, and (2) a Global Keypoint Feature Aggregator using bidirectional Mamba for structural consistency. To enable bidirectional modeling in Mamba, we introduce a Feature Sequence Flipping strategy that preserves spatial coherence while constructing backward feature sequences. Additionally, we design a surface loss and a separation loss to enforce uniform coverage and spatial diversity in keypoint distribution. The generated keypoints are finally mapped to a canonical space for regressing the object's 6D pose and size. Extensive experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance and significantly outperforms existing methods.

Problem

Research questions and friction points this paper is trying to address.

Estimating 6D pose and size of unseen object instances

Handling complex geometries and shape deviations

Improving generalization across diverse object categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Instance-adaptive keypoint learning for pose estimation

Local-to-global geometric aggregation with Mamba

Feature sequence flipping for bidirectional modeling

🔎 Similar Papers

No similar papers found.