Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics

📅 2024-08-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the planning challenge for long-horizon, complex robotic manipulation tasks under partial-view point clouds—tasks requiring geometric reasoning, multi-object interaction, and occluded-object inference. We propose a hierarchical planning framework integrating large language models (LLMs) with a sampling-based continuous-parameter optimizer. Our key contributions are: (1) a novel relational dynamics model trained exclusively on single-step simulation data, enabling zero-shot generalization to arbitrary-length real-world tasks; (2) a unified relational representation that bridges point-cloud perception, natural language instruction understanding, and physically feasible action generation; and (3) a geometry- and occlusion-aware point-cloud encoder coupled with multimodal prompt engineering. Evaluated on real-world long-horizon tasks, our method achieves >85% success rate—significantly surpassing the best prior baseline (50%)—and demonstrates strong generalization across challenging scenarios involving multi-object interaction, geometric reasoning, and occlusion-aware manipulation.

Technology Category

Application Category

📝 Abstract

We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a language instruction and a point cloud of the scene, our framework initiates a hierarchical planning procedure, whereby a language model generates a high-level plan and a sampling-based planner produces constraint-satisfying continuous parameters for manipulation primitives sequenced according to the high-level plan. Key to our approach is the use of a relational dynamics model as a unifying interface between the continuous and symbolic representations of states and actions, thus facilitating language-driven planning from high-dimensional perceptual input such as point clouds. Whereas previous relational dynamics models require training on datasets of multi-step manipulation scenarios that align with the intended test scenarios, Points2Plans uses only single-step simulated training data while generalizing zero-shot to a variable number of steps during real-world evaluations. We evaluate our approach on tasks involving geometric reasoning, multi-object interactions, and occluded object reasoning in both simulated and real-world settings. Results demonstrate that Points2Plans offers strong generalization to unseen long-horizon tasks in the real world, where it solves over 85% of evaluated tasks while the next best baseline solves only 50%.

Problem

Research questions and friction points this paper is trying to address.

Enables robots to solve long-horizon manipulation tasks from partial-view point clouds.

Uses a relational dynamics model to bridge continuous and symbolic representations.

Generalizes zero-shot to real-world tasks with only single-step simulated training data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical planning with language and sampling-based methods

Relational dynamics model bridges symbolic and continuous representations

Zero-shot generalization using single-step simulated training data

🔎 Similar Papers

No similar papers found.

Field AI

Irvine, CA

Research Scientist, Sensor and Systems Robotics (PhD)