Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Quadruped robots exhibit insufficient behavioral expressiveness in social scenarios, and existing approaches struggle to balance natural-language input (high-efficiency but low-resolution) with human preference learning (high-resolution but sample-inefficient). Method: We propose a language-guided preference learning framework that integrates large language model (LLM)-derived behavioral priors, preference-based reinforcement learning, and motion policy optimization, enabling reliable sim-to-real transfer. Contribution/Results: Our key innovation is the first use of LLM-generated behavioral priors to guide the sampling process in preference learning, dramatically improving sample efficiency—convergence is achieved within only four rounds of human feedback. Experiments demonstrate a 3–5× improvement in sample efficiency over both pure language-driven and conventional preference-learning baselines, alongside significant gains in behavioral accuracy and user satisfaction.

Technology Category

Application Category

📝 Abstract

Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Recent advancements in learned legged locomotion controllers have enabled more dynamic and versatile robot behaviors. However, determining the optimal behavior for interactions with different users across varied scenarios remains a challenge. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations. Our core insight is that LLMs can guide the sampling process for preference learning, leading to a substantial improvement in sample efficiency. We demonstrate that LGPL can quickly learn accurate and expressive behaviors with as few as four queries, outperforming both purely language-parameterized models and traditional preference learning approaches. Website with videos: https://lgpl-gaits.github.io/

Problem

Research questions and friction points this paper is trying to address.

Generating expressive quadruped behaviors efficiently

Combining language input with preference learning

Improving sample efficiency in behavior learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-Guided Preference Learning

LLMs generate initial samples

Refinement through preference feedback

🔎 Similar Papers

No similar papers found.