Category-Level Object Shape and Pose Estimation in Less Than a Millisecond

๐Ÿ“… 2025-09-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the real-time estimation of category-level object shape and 6D pose in robotic tasks. Methodologically, it introduces a millisecond-scale joint optimization framework: (1) a learned front-end detects sparse semantic keypoints from RGB-D input; (2) intra-category shape variations are modeled via a linear active shape model; (3) for the first time, joint maximum a posteriori estimation is formulated as a quaternion-constrained nonlinear optimization problem, with its first-order optimality conditions rigorously derived; (4) a self-consistent field iterative algorithm is proposed, requiring only the computation of the smallest eigenpair of a 4ร—4 matrixโ€”each iteration takes approximately 100 ฮผs, ensuring both computational efficiency and verifiable global optimality. Extensive evaluation on synthetic data, two public benchmarks, and a real-world UAV tracking scenario demonstrates high accuracy and robustness in category-level estimation.

Technology Category

Application Category

๐Ÿ“ Abstract
Object shape and pose estimation is a foundational robotics problem, supporting tasks from manipulation to scene understanding and navigation. We present a fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality. Given an RGB-D image of an object, we use a learned front-end to detect sparse, category-level semantic keypoints on the target object. We represent the target object's unknown shape using a linear active shape model and pose a maximum a posteriori optimization problem to solve for position, orientation, and shape simultaneously. Expressed in unit quaternions, this problem admits first-order optimality conditions in the form of an eigenvalue problem with eigenvector nonlinearities. Our primary contribution is to solve this problem efficiently with self-consistent field iteration, which only requires computing a 4-by-4 matrix and finding its minimum eigenvalue-vector pair at each iterate. Solving a linear system for the corresponding Lagrange multipliers gives a simple global optimality certificate. One iteration of our solver runs in about 100 microseconds, enabling fast outlier rejection. We test our method on synthetic data and a variety of real-world settings, including two public datasets and a drone tracking scenario. Code is released at https://github.com/MIT-SPARK/Fast-ShapeAndPose.
Problem

Research questions and friction points this paper is trying to address.

Estimating object shape and pose using category-level priors from RGB-D images
Solving maximum a posteriori optimization for position, orientation, and shape simultaneously
Developing fast local solver with global optimality certificate in under a millisecond
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned front-end detects semantic keypoints
Maximum a posteriori optimization for pose and shape
Self-consistent field iteration enables fast optimality certificate
๐Ÿ”Ž Similar Papers
No similar papers found.