🤖 AI Summary
Quadrupedal robots face significant challenges in generalizing gait control across heterogeneous, missing, or structurally varying sensor configurations and body morphologies.
Method: We propose a masked sensory-temporal attention mechanism built upon a lightweight Transformer architecture. This is the first approach to enable sensor-level fine-grained attention modeling, integrating dynamic sensor masking with cross-modal temporal attention to achieve robust state representation under variable input dimensions and severe sensor dropout (up to 70%).
Contribution/Results: Evaluated in simulation and on diverse real-world quadrupeds (e.g., Unitree A1, Go2), our policy demonstrates strong cross-hardware transferability—requiring only a single training run to adapt seamlessly to differing sensor suites and mechanical designs. It maintains stable locomotion even under extreme input degradation, substantially improving the robustness and generalizability of learning-based locomotion policies for real-world deployment.
📝 Abstract
With the rising focus on quadrupeds, a generalized policy capable of handling different robot models and sensory inputs will be highly beneficial. Although several methods have been proposed to address different morphologies, it remains a challenge for learning-based policies to manage various combinations of proprioceptive information. This paper presents Masked Sensory-Temporal Attention (MSTA), a novel transformer-based model with masking for quadruped locomotion. It employs direct sensor-level attention to enhance sensory-temporal understanding and handle different combinations of sensor data, serving as a foundation for incorporating unseen information. This model can effectively understand its states even with a large portion of missing information, and is flexible enough to be deployed on a physical system despite the long input sequence.