MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing physics-driven whole-body dexterous manipulation approaches rely on precise trajectory tracking or VR-based teleoperation, rendering them ill-suited for prolonged, weakly constrained locomotion-manipulation tasks (e.g., “grasp cup → transport → insert into slot”) and incapable of naturally responding to high-level objectives (e.g., object pose, key body configurations). This work introduces the first unified generative whole-body control policy. It employs a two-stage learning framework: first, training a physics-aware motion-capture-driven tracking controller; second, distilling knowledge and applying masked action generation to decouple high-level goals from low-level motor execution. By integrating rigid-body dynamics simulation with goal-conditioned sequence modeling, the method achieves high-fidelity, stable, and real-time control in complex locomanipulation scenarios. It significantly improves generalization to unseen targets and enhances interaction naturalness, enabling partial-goal-driven intelligent animation synthesis.

Technology Category

Application Category

📝 Abstract

Humans interact with their world while leveraging precise full-body control to achieve versatile goals. This versatility allows them to solve long-horizon, underspecified problems, such as placing a cup in a sink, by seamlessly sequencing actions like approaching the cup, grasping, transporting it, and finally placing it in the sink. Such goal-driven control can enable new procedural tools for animation systems, enabling users to define partial objectives while the system naturally ``fills in'' the intermediate motions. However, while current methods for whole-body dexterous manipulation in physics-based animation achieve success in specific interaction tasks, they typically employ control paradigms (e.g., detailed kinematic motion tracking, continuous object trajectory following, or direct VR teleoperation) that offer limited versatility for high-level goal specification across the entire coupled human-object system. To bridge this gap, we present MaskedManipulator, a unified and generative policy developed through a two-stage learning approach. First, our system trains a tracking controller to physically reconstruct complex human-object interactions from large-scale human mocap datasets. This tracking controller is then distilled into MaskedManipulator, which provides users with intuitive control over both the character's body and the manipulated object. As a result, MaskedManipulator enables users to specify complex loco-manipulation tasks through intuitive high-level objectives (e.g., target object poses, key character stances), and MaskedManipulator then synthesizes the necessary full-body actions for a physically simulated humanoid to achieve these goals, paving the way for more interactive and life-like virtual characters.

Problem

Research questions and friction points this paper is trying to address.

Achieve versatile whole-body control for loco-manipulation tasks

Bridge gap between high-level goal specification and detailed motion synthesis

Enable intuitive control of human-object interactions in virtual environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage learning for unified policy

Tracking controller from mocap data

Intuitive high-level task specification

🔎 Similar Papers

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

2024-07-30arXiv.orgCitations: 5

Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost

2024-09-23Citations: 1