π€ AI Summary
Quadrupedal robots exhibit limited capability in long-horizon, obstacle-aware manipulation of large objects within complex environments.
Method: This paper proposes a hierarchical multi-agent reinforcement learning framework: a centralized adaptive policy coordinated with RRT-based planning generates high-level task commands; a decentralized goal-conditioned policy enables collaborative decision-making among multiple robots at the mid-level; and a pre-trained locomotion controller ensures robust motion generation at the low level. The framework supports centralized training with decentralized execution (CTDE) and enables efficient sim-to-real transfer on the Go1 platform.
Results: Experiments demonstrate a 36.0% improvement in task success rate and a 24.5% reduction in completion time in simulation. Crucially, the method achieves the first successful real-robot execution of long-horizon, obstacle-aware pushing tasksβPush-Cuboid and Push-Tβon physical quadrupeds, significantly enhancing operational practicality for applications such as search-and-rescue and industrial automation.
π Abstract
Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.