Uncertainty Quantification for Visual Object Pose Estimation

πŸ“… 2025-11-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Quantifying uncertainty in 6D object pose estimation from monocular vision is critical for robust robotic control, yet existing methods often rely on strong distributional assumptions. This paper proposes SLUEβ€”the first initialization-free, distribution-agnostic pose uncertainty quantification method that rigorously guarantees containment of the true pose within an ellipsoidal confidence region. SLUE leverages bounded pixel noise on 2D semantic keypoints and employs the S-lemma to convexify nonconvex geometric constraints. It then applies a sum-of-squares (SOS) hierarchy relaxation to approximate the minimum-volume enclosing ellipsoid, enabling strict propagation of uncertainty from pixel-level errors to 3D pose space. Evaluated on two standard benchmarks and a real-world UAV tracking scenario, SLUE significantly tightens translational uncertainty bounds while maintaining competitive rotational bounds, and supports projection onto decoupled translation and axis-angle representations.

Technology Category

Application Category

πŸ“ Abstract
Quantifying the uncertainty of an object's pose estimate is essential for robust control and planning. Although pose estimation is a well-studied robotics problem, attaching statistically rigorous uncertainty is not well understood without strict distributional assumptions. We develop distribution-free pose uncertainty bounds about a given pose estimate in the monocular setting. Our pose uncertainty only requires high probability noise bounds on pixel detections of 2D semantic keypoints on a known object. This noise model induces an implicit, non-convex set of pose uncertainty constraints. Our key contribution is SLUE (S-Lemma Uncertainty Estimation), a convex program to reduce this set to a single ellipsoidal uncertainty bound that is guaranteed to contain the true object pose with high probability. SLUE solves a relaxation of the minimum volume bounding ellipsoid problem inspired by the celebrated S-lemma. It requires no initial guess of the bound's shape or size and is guaranteed to contain the true object pose with high probability. For tighter uncertainty bounds at the same confidence, we extend SLUE to a sum-of-squares relaxation hierarchy which is guaranteed to converge to the minimum volume ellipsoidal uncertainty bound for a given set of keypoint constraints. We show this pose uncertainty bound can easily be projected to independent translation and axis-angle orientation bounds. We evaluate SLUE on two pose estimation datasets and a real-world drone tracking scenario. Compared to prior work, SLUE generates substantially smaller translation bounds and competitive orientation bounds. We release code at https://github.com/MIT-SPARK/PoseUncertaintySets.
Problem

Research questions and friction points this paper is trying to address.

Develops distribution-free pose uncertainty bounds for monocular object pose estimation
Uses convex programming to compute ellipsoidal uncertainty bounds from keypoint detection noise
Projects uncertainty bounds to independent translation and orientation for robust control
Innovation

Methods, ideas, or system contributions that make the work stand out.

SLUE uses convex programming for pose uncertainty bounds
Method requires only high probability noise bounds on keypoints
Extends to sum-of-squares hierarchy for tighter ellipsoidal bounds