🤖 AI Summary
This work addresses state estimation for discrete-time nonlinear stochastic systems with finite-dimensional states and infinite-dimensional measurements—such as image fields—arising in visual localization and tracking. Conventional methods struggle to rigorously model the relationship between image gradients and the Jacobian of the observation function with respect to the state. To overcome this, we propose an extended Kalman filter (EKF) grounded in infinite-dimensional random field modeling. Crucially, we establish, for the first time from a systems-theoretic perspective, that image gradients correspond precisely to the Fréchet derivative of the observation functional with respect to the state—thereby providing a rigorous mathematical foundation for gradient-based visual state estimation. Evaluated on monocular visual-inertial navigation for unmanned aerial vehicles, our approach reduces root-mean-square error by an order of magnitude compared to VINS-MONO, demonstrating substantial improvements in both estimation accuracy and robustness.
📝 Abstract
This article examines state estimation in discrete-time nonlinear stochastic systems with finite-dimensional states and infinite-dimensional measurements, motivated by real-world applications such as vision-based localization and tracking. We develop an extended Kalman filter (EKF) for real-time state estimation, with the measurement noise modeled as an infinite-dimensional random field. When applied to vision-based state estimation, the measurement Jacobians required to implement the EKF are shown to correspond to image gradients. This result provides a novel system-theoretic justification for the use of image gradients as features for vision-based state estimation, contrasting with their (often heuristic) introduction in many computer-vision pipelines. We demonstrate the practical utility of the EKF on a public real-world dataset involving the localization of an aerial drone using video from a downward-facing monocular camera. The EKF is shown to outperform VINS-MONO, an established visual-inertial odometry algorithm, in some cases achieving mean squared error reductions of up to an order of magnitude.