🤖 AI Summary
Dynamic bipedal robots struggle to achieve smooth, continuous transitions between locomotion and manipulation modalities in integrated pose-manipulation tasks (e.g., robot soccer).
Method: This work departs from conventional hierarchical control paradigms and proposes a preference-guided, single-policy end-to-end reinforcement learning framework orchestrated by a preference-aware Oracle. It integrates hybrid automata modeling, bounded-exploration guided policy optimization, whole-body dynamics control, and contact-aware learning. A task-agnostic preference reward mechanism enables zero-shot, parameter-free transfer across diverse humanoid platforms (e.g., HECTOR V1, G1, H1).
Results: The learned policy achieves end-to-end dynamic locomanipulation—including sprinting approach, contact-based dribbling, precise shooting, and kick-and-stop—demonstrating strong generalization and robustness in both soccer and omnidirectional box搬运 tasks.
📝 Abstract
Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal - a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.