Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation

📅 2024-10-01

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Dynamic bipedal robots struggle to achieve smooth, continuous transitions between locomotion and manipulation modalities in integrated pose-manipulation tasks (e.g., robot soccer). Method: This work departs from conventional hierarchical control paradigms and proposes a preference-guided, single-policy end-to-end reinforcement learning framework orchestrated by a preference-aware Oracle. It integrates hybrid automata modeling, bounded-exploration guided policy optimization, whole-body dynamics control, and contact-aware learning. A task-agnostic preference reward mechanism enables zero-shot, parameter-free transfer across diverse humanoid platforms (e.g., HECTOR V1, G1, H1). Results: The learned policy achieves end-to-end dynamic locomanipulation—including sprinting approach, contact-based dribbling, precise shooting, and kick-and-stop—demonstrating strong generalization and robustness in both soccer and omnidirectional box搬运 tasks.

Technology Category

Application Category

📝 Abstract

Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal - a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.

Problem

Research questions and friction points this paper is trying to address.

Develop single policy for dynamic bipedal loco-manipulation tasks

Enable smooth transitions between motion modes without explicit switching

Achieve whole-body control across diverse robot morphologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single policy mastering multiple modes

Hybrid automatons for guided optimization

Task-agnostic preference reward enhancement

🔎 Similar Papers

No similar papers found.