π€ AI Summary
This work addresses the limitations of directly modeling action spaces in long-horizon robotic manipulation tasks by proposing a task learning approach based on sequential graphical representations of object relationships. Instead of imitating low-level actions, the method infers high-level task goals from a small number of demonstrations. It globally models the evolution of scene states through a cross-stage object interaction graph and enhances robustness in multi-demonstration learning by integrating demonstration segmentation, pooling mechanisms, and pre-trained visual feature matching. Experimental results demonstrate that the approach accurately segments task phases, learns sparse yet effective task representations, and enables reliable execution across both simulated and real-world robotic environments.
π Abstract
Learning long-horizon manipulation tasks efficiently is a central challenge in robot learning from demonstration. Unlike recent endeavors that focus on directly learning the task in the action domain, we focus on inferring what the robot should achieve in the task, rather than how to do so. To this end, we represent evolving scene states using a series of graphical object relationships. We propose a demonstration segmentation and pooling approach that extracts a series of manipulation graphs and estimates distributions over object states across task phases. In contrast to prior graph-based methods that capture only partial interactions or short temporal windows, our approach captures complete object interactions spanning from the onset of control to the end of the manipulation. To improve robustness when learning from multiple demonstrations, we additionally perform object matching using pre-trained visual features. In extensive experiments, we evaluate our method's demonstration segmentation accuracy and the utility of learning from multiple demonstrations for finding a desired minimal task model. Finally, we deploy the fitted models both in simulation and on a real robot, demonstrating that the resulting task representations support reliable execution across environments.