🤖 AI Summary
This work addresses the challenge of learning robust locomotion policies for quadrupedal robots from extremely limited human demonstrations. To this end, the authors propose a novel approach that integrates dynamical systems theory with imitation learning. By explicitly modeling the limit-cycle structure of gaits and the associated Poincaré return map, they introduce a consistency regularization mechanism between the latent representation and action outputs to enforce local dynamic alignment. The method enables offline training of diverse, robust walking policies using only a few seconds of human demonstration, substantially improving data efficiency and generalization. Hardware experiments on a real quadruped platform demonstrate the approach’s effectiveness and deployment feasibility, offering insights into the underlying mechanisms that enable successful few-shot imitation learning for quadrupedal locomotion.
📝 Abstract
Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and the combinatorial explosion of mode changes. We give a principled analysis of why imitation learning with quadrupeds can be inherently effective in a small data regime, based on the structure of its limit cycles, Poincar\'e return maps, and local numerical properties of neural networks. The understanding motivates a new imitation learning method that regulates the alignment between variations in a latent space and those over the output actions. Hardware experiments confirm that a few seconds of demonstration is sufficient to train various locomotion policies from scratch entirely offline with reasonable robustness.