🤖 AI Summary
This work addresses the limitations of robot learning imposed by the high cost of data collection and the limited diversity of human demonstrations, which often fail to cover the state space required for complex manipulation tasks. The authors propose a task-agnostic, black-box simulation framework that integrates RRT-style exploration, sampling-based model predictive control (MPC), and a novel sampling mechanism grounded in stable-state manifolds. This approach enhances exploration diversity while preserving action stability, enabling the direct construction of long-horizon policy trees in simulation. The method successfully generates a wide range of manipulation strategies—including pushing, grasping, rotating, throwing, and tool use—across diverse robot morphologies, thereby overcoming the constraints of conventional local trajectory optimization techniques.
📝 Abstract
Scaling up datasets is highly effective in improving the performance of deep learning models, including in the field of robot learning. However, data collection still proves to be a bottleneck. Approaches relying on collecting human demonstrations are labor-intensive and inherently limited: they tend to be narrow, task-specific, and fail to adequately explore the full space of feasible states. Synthetic data generation could remedy this, but current techniques mostly rely on local trajectory optimization and fail to find diverse solutions. In this work, we propose a novel method capable of finding diverse long-horizon manipulations through black-box simulation. We achieve this by combining an RRT-style search with sampling-based MPC, together with a novel sampling scheme that guides the exploration toward stable configurations. Specifically, we sample from a manifold of stable states while growing a search tree directly through simulation, without restricting the planner to purely stable motions. We demonstrate the method's ability to discover diverse manipulation strategies, including pushing, grasping, pivoting, throwing, and tool use, across different robot morphologies, without task-specific guidance.