🤖 AI Summary
To address the challenge of real-time monocular-image-based motion trajectory prediction for off-road robots navigating unstructured terrain, this paper proposes a physics-guided, end-to-end differentiable trajectory generation framework. Methodologically, we introduce a novel neural-symbolic physics layer that tightly couples an image-conditioned contact force prediction network with a differentiable rigid-body dynamics engine—constituting the first learnable, image-driven physics engine embedded with geometric and physical priors. Our contributions are threefold: (1) substantial reduction of the sim-to-real gap; (2) enhanced robustness to out-of-distribution terrains; and (3) high-throughput simulation at 10⁴ trajectories per second, enabling real-time deployment in downstream tasks including model predictive control (MPC), reinforcement learning, and simultaneous localization and mapping (SLAM).
📝 Abstract
We propose a novel model for the prediction of robot trajectories on rough offroad terrain from the onboard camera images. This model enforces the laws of classical mechanics through a physics-aware neural symbolic layer while preserving the ability to learn from large-scale data as it is end-to-end differentiable. The proposed hybrid model integrates a black-box component that predicts robot-terrain interaction forces with a neural-symbolic layer. This layer includes a differentiable physics engine that computes the robot's trajectory by querying these forces at the points of contact with the terrain. As the proposed architecture comprises substantial geometrical and physics priors, the resulting model can also be seen as a learnable physics engine conditioned on real images that delivers $10^4$ trajectories per second. We argue and empirically demonstrate that this architecture reduces the sim-to-real gap and mitigates out-of-distribution sensitivity. The differentiability, in conjunction with the rapid simulation speed, makes the model well-suited for various applications including model predictive control, trajectory shooting, supervised and reinforcement learning or SLAM. The codes and data are publicly available.