Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion

📅 2024-09-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Quadrupedal robots face significant challenges in generalizing gait control across heterogeneous, missing, or structurally varying sensor configurations and body morphologies. Method: We propose a masked sensory-temporal attention mechanism built upon a lightweight Transformer architecture. This is the first approach to enable sensor-level fine-grained attention modeling, integrating dynamic sensor masking with cross-modal temporal attention to achieve robust state representation under variable input dimensions and severe sensor dropout (up to 70%). Contribution/Results: Evaluated in simulation and on diverse real-world quadrupeds (e.g., Unitree A1, Go2), our policy demonstrates strong cross-hardware transferability—requiring only a single training run to adapt seamlessly to differing sensor suites and mechanical designs. It maintains stable locomotion even under extreme input degradation, substantially improving the robustness and generalizability of learning-based locomotion policies for real-world deployment.

Technology Category

Application Category

📝 Abstract

With the rising focus on quadrupeds, a generalized policy capable of handling different robot models and sensory inputs will be highly beneficial. Although several methods have been proposed to address different morphologies, it remains a challenge for learning-based policies to manage various combinations of proprioceptive information. This paper presents Masked Sensory-Temporal Attention (MSTA), a novel transformer-based model with masking for quadruped locomotion. It employs direct sensor-level attention to enhance sensory-temporal understanding and handle different combinations of sensor data, serving as a foundation for incorporating unseen information. This model can effectively understand its states even with a large portion of missing information, and is flexible enough to be deployed on a physical system despite the long input sequence.

Problem

Research questions and friction points this paper is trying to address.

Generalized policy for different robot models and sensors

Handling various combinations of proprioceptive information

Effective state understanding with missing sensor data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based mechanism with masking

Direct sensor-level attention for sensory-temporal understanding

Handles missing information and long input sequences

🔎 Similar Papers

CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains