MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation

πŸ“… 2025-05-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing physics-driven whole-body dexterous manipulation approaches rely on precise trajectory tracking or VR-based teleoperation, rendering them ill-suited for prolonged, weakly constrained locomotion-manipulation tasks (e.g., β€œgrasp cup β†’ transport β†’ insert into slot”) and incapable of naturally responding to high-level objectives (e.g., object pose, key body configurations). This work introduces the first unified generative whole-body control policy. It employs a two-stage learning framework: first, training a physics-aware motion-capture-driven tracking controller; second, distilling knowledge and applying masked action generation to decouple high-level goals from low-level motor execution. By integrating rigid-body dynamics simulation with goal-conditioned sequence modeling, the method achieves high-fidelity, stable, and real-time control in complex locomanipulation scenarios. It significantly improves generalization to unseen targets and enhances interaction naturalness, enabling partial-goal-driven intelligent animation synthesis.

Technology Category

Application Category

πŸ“ Abstract
Humans interact with their world while leveraging precise full-body control to achieve versatile goals. This versatility allows them to solve long-horizon, underspecified problems, such as placing a cup in a sink, by seamlessly sequencing actions like approaching the cup, grasping, transporting it, and finally placing it in the sink. Such goal-driven control can enable new procedural tools for animation systems, enabling users to define partial objectives while the system naturally ``fills in'' the intermediate motions. However, while current methods for whole-body dexterous manipulation in physics-based animation achieve success in specific interaction tasks, they typically employ control paradigms (e.g., detailed kinematic motion tracking, continuous object trajectory following, or direct VR teleoperation) that offer limited versatility for high-level goal specification across the entire coupled human-object system. To bridge this gap, we present MaskedManipulator, a unified and generative policy developed through a two-stage learning approach. First, our system trains a tracking controller to physically reconstruct complex human-object interactions from large-scale human mocap datasets. This tracking controller is then distilled into MaskedManipulator, which provides users with intuitive control over both the character's body and the manipulated object. As a result, MaskedManipulator enables users to specify complex loco-manipulation tasks through intuitive high-level objectives (e.g., target object poses, key character stances), and MaskedManipulator then synthesizes the necessary full-body actions for a physically simulated humanoid to achieve these goals, paving the way for more interactive and life-like virtual characters.
Problem

Research questions and friction points this paper is trying to address.

Achieve versatile whole-body control for loco-manipulation tasks
Bridge gap between high-level goal specification and detailed motion synthesis
Enable intuitive control of human-object interactions in virtual environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage learning for unified policy
Tracking controller from mocap data
Intuitive high-level task specification
πŸ”Ž Similar Papers
No similar papers found.
Chen Tessler
Chen Tessler
Research Scientist, NVIDIA Research
Reinforcement LearningPhysics SimulationRobotics
Y
Yifeng Jiang
NVIDIA
E
Erwin Coumans
NVIDIA
Z
Zhengyi Luo
NVIDIA
Gal Chechik
Gal Chechik
NVIDIA, Bar Ilan University
Machine learningAIMachine perception
X
Xue Bin Peng
NVIDIA and Simon Fraser University