🤖 AI Summary
This work proposes REAL, an end-to-end robust control framework addressing the fragility of quadrupedal robots during extreme parkour tasks under perceptual degradation such as visual noise or latency. REAL integrates visual inputs, proprioceptive history, and short-term terrain memory through a novel architecture combining a FiLM-modulated Mamba backbone, cross-modal policy distillation, and physics-informed Bayesian state estimation. This design enables active suppression of visual disturbances while ensuring rigid-body dynamics consistency during high-impact maneuvers. Experiments on the Unitree Go2 platform demonstrate that the system reliably traverses complex obstacles even with a 1-meter visual blind spot, achieving a low inference latency of 13.1 milliseconds—sufficient for real-time control.
📝 Abstract
Extreme legged parkour demands rapid terrain assessment and precise foot placement under highly dynamic conditions. While recent learning-based systems achieve impressive agility, they remain fundamentally fragile to perceptual degradation, where even brief visual noise or latency can cause catastrophic failure. To overcome this, we propose Robust Extreme Agility Learning (REAL), an end-to-end framework for reliable parkour under sensory corruption. Instead of relying on perfectly clean perception, REAL tightly couples vision, proprioceptive history, and temporal memory. We distill a cross-modal teacher policy into a deployable student equipped with a FiLM-modulated Mamba backbone to actively filter visual noise and build short-term terrain memory actively. Furthermore, a physics-guided Bayesian state estimator enforces rigid-body consistency during high-impact maneuvers. Validated on a Unitree Go2 quadruped, REAL successfully traverses extreme obstacles even with a 1-meter visual blind zone, while strictly satisfying real-time control constraints with a bounded 13.1 ms inference time.