Teaching Robots Like Dogs: Learning Agile Navigation from Luring, Gesture, and Speech

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the challenge of high human demonstration burden in physically guided learning of human social cues for legged robots. To this end, the authors propose an efficient human-in-the-loop learning framework that integrates natural multimodal commands—such as gestures and speech—with physics-based simulation to reconstruct interactive scenarios. The approach further incorporates a progressive goal prompting strategy and a data aggregation mechanism to mitigate distributional shift under limited demonstration data. Evaluated on six real-world agile navigation tasks, the method achieves a 97.15% task success rate using less than one hour of human demonstrations, substantially improving both sample efficiency and generalization capability.

Technology Category

Application Category

📝 Abstract

In this work, we aim to enable legged robots to learn how to interpret human social cues and produce appropriate behaviors through physical human guidance. However, learning through physical engagement can place a heavy burden on users when the process requires large amounts of human-provided data. To address this, we propose a human-in-the-loop framework that enables robots to acquire navigational behaviors in a data-efficient manner and to be controlled via multimodal natural human inputs, specifically gestural and verbal commands. We reconstruct interaction scenes using a physics-based simulation and aggregate data to mitigate distributional shifts arising from limited demonstration data. Our progressive goal cueing strategy adaptively feeds appropriate commands and navigation goals during training, leading to more accurate navigation and stronger alignment between human input and robot behavior. We evaluate our framework across six real-world agile navigation scenarios, including jumping over or avoiding obstacles. Our experimental results show that our proposed method succeeds in almost all trials across these scenarios, achieving a 97.15% task success rate with less than 1 hour of demonstration data in total.

Problem

Research questions and friction points this paper is trying to address.

legged robots

agile navigation

human social cues

multimodal commands

data-efficient learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

human-in-the-loop learning

multimodal human-robot interaction

physics-based simulation