Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Generating diverse, physically feasible whole-body motions for humanoid robots from natural language instructions remains challenging. Method: We propose a large language–action model framework that establishes a unified human-robot motion vocabulary, integrating discrete motion tokenization, privileged policy distillation, and dynamics-aware reinforcement learning fine-tuning to enable end-to-end mapping from language to high-fidelity, dynamically stable motions. Contribution/Results: Our approach is the first to jointly design semantic motion discretization and physics-embedded policy optimization, balancing generalization and physical feasibility. Evaluated in simulation and on a real Unitree G1 robot, it achieves significant improvements over prior methods in motion naturalness, dynamic stability, and multi-step task success rate. This work establishes a scalable, language-conditioned whole-body control paradigm for general embodied intelligence.

Technology Category

Application Category

📝 Abstract

Enabling humanoid robots to follow free-form language commands is critical for seamless human-robot interaction, collaborative task execution, and general-purpose embodied intelligence. While recent advances have improved low-level humanoid locomotion and robot manipulation, language-conditioned whole-body control remains a significant challenge. Existing methods are often limited to simple instructions and sacrifice either motion diversity or physical plausibility. To address this, we introduce Humanoid-LLA, a Large Language Action Model that maps expressive language commands to physically executable whole-body actions for humanoid robots. Our approach integrates three core components: a unified motion vocabulary that aligns human and humanoid motion primitives into a shared discrete space; a vocabulary-directed controller distilled from a privileged policy to ensure physical feasibility; and a physics-informed fine-tuning stage using reinforcement learning with dynamics-aware rewards to enhance robustness and stability. Extensive evaluations in simulation and on a real-world Unitree G1 humanoid show that Humanoid-LLA delivers strong language generalization while maintaining high physical fidelity, outperforming existing language-conditioned controllers in motion naturalness, stability, and execution success rate.

Problem

Research questions and friction points this paper is trying to address.

Enabling humanoid robots to follow free-form language commands

Addressing language-conditioned whole-body control challenges

Overcoming limitations in motion diversity and physical plausibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified motion vocabulary aligning human and robot primitives

Vocabulary-directed controller distilled from privileged policy

Physics-informed fine-tuning with dynamics-aware reinforcement learning

🔎 Similar Papers

No similar papers found.