🤖 AI Summary
Conventional rigid-body assumptions fail for soft robots due to dynamic elastic deformations, and monocular visual odometry (VO) cannot recover metric scale or gravity direction.
Method: We propose a continuous-time state estimation framework integrating physical dynamics priors. Camera trajectory is parameterized via B-splines; elastic deformation is modeled by a multi-layer perceptron (MLP) learning the force–deformation mapping; and visual measurements are explicitly coupled with inertial dynamics through Newton’s second law. The optimization jointly enforces geometric consistency, motion continuity, and physical interpretability.
Results: Evaluated on a spring-mounted camera platform, our method significantly improves pose estimation accuracy and robustness. To the best of our knowledge, it is the first monocular VO framework for non-rigid systems that achieves end-to-end perception of true metric scale, gravity orientation, and inertial alignment—without external calibration or prior knowledge of system parameters.
📝 Abstract
Accurate state estimation for flexible robotic systems poses significant challenges, particular for platforms with dynamically deforming structures that invalidate rigid-body assumptions. This paper tackles this problem and allows to extend existing rigid-body pose estimation methods to non-rigid systems. Our approach hinges on two core assumptions: first, the elastic properties are captured by an injective deformation-force model, efficiently learned via a Multi-Layer Perceptron; second, we solve the platform's inherently smooth motion using continuous-time B-spline kinematic models. By continuously applying Newton's Second Law, our method establishes a physical link between visually-derived trajectory acceleration and predicted deformation-induced acceleration. We demonstrate that our approach not only enables robust and accurate pose estimation on non-rigid platforms, but that the properly modeled platform physics instigate inertial sensing properties. We demonstrate this feasibility on a simple spring-camera system, and show how it robustly resolves the typically ill-posed problem of metric scale and gravity recovery in monocular visual odometry.