🤖 AI Summary
This work addresses the challenge of jointly estimating a target’s motion and physical dimensions from monocular vision, a task often hindered by restrictive assumptions such as isotropy or purely lateral motion that limit real-world applicability. The authors propose a novel bearing-box approach that leverages 3D bounding box outputs from modern 3D object detectors to enhance system observability without relying on such assumptions. By integrating 3D detection, observability analysis, nonlinear state estimation, and a compact dynamical model of micro multirotor UAVs—exploiting thrust-acceleration coupling to simplify dynamics—the method achieves significantly improved robustness and accuracy. Both theoretical analysis and experimental results demonstrate its superior performance over existing bearing-only estimators in jointly recovering target motion and size.
📝 Abstract
Monocular vision-based target motion estimation is a fundamental challenge in numerous applications. This work introduces a novel bearing-box approach that fully leverages modern 3D detection measurements that are widely available nowadays but have not been well explored for motion estimation so far. Unlike existing methods that rely on restrictive assumptions such as isotropic target shape and lateral motion, our bearing-box estimator can estimate both the target's motion and its physical size without these assumptions by exploiting the information buried in a 3D bounding box. When applied to multi-rotor micro aerial vehicles (MAVs), the estimator yields an interesting advantage: it further removes the need for higher-order motion assumptions by exploiting the unique coupling between MAV's acceleration and thrust. This is particularly significant, as higher-order motion assumptions are widely believed to be necessary in state-of-the-art bearing-based estimators. We support our claims with rigorous observability analyses and extensive experimental validation, demonstrating the estimator's superior performance in real-world scenarios.