Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models

📅 2025-01-30

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing robotic rearrangement methods exhibit limited flexibility in interpreting natural-language instructions and reasoning about placement locations in open-world scenarios. This paper proposes a zero-shot object rearrangement framework for collaborative robots, capable of parsing unconstrained language commands and inferring target layouts without task-specific pretraining data. Our core contribution is an analogy-based reasoning mechanism centered on a large language model (LLM), which retrieves historically successful rearrangement cases and employs language-action joint prompting with dynamic memory augmentation. This design enables strong generalization to unseen object categories, complex multi-step instructions, and sequential rearrangement tasks. Experiments demonstrate significant improvements in task success rates and cross-task transfer performance within multi-object, open-layout environments—overcoming traditional reliance on structured instructions and task-specific supervised training.

Technology Category

Application Category

📝 Abstract

Object rearrangement is a significant task for collaborative robots, where they are directed to manipulate objects into a specified goal state. Determining the placement of objects is a major challenge that influences the efficiency of the rearrangement process. Most current methods heavily rely on pre-collected datasets to train the model for predicting the goal position and are restricted to specific instructions, which limits their broader applicability and effectiveness.In this paper, we propose a framework of language-conditioned object rearrangement based on the Large Language Model (LLM). Particularly, our approach mimics human reasoning by using past successful experiences as a reference to infer the desired goal position. Based on LLM's strong natural language comprehension and inference ability, our method can generalise to handle various everyday objects and free-form language instructions in a zero-shot manner. Experimental results demonstrate that our methods can effectively execute the robotic rearrangement tasks, even those involving long sequential orders.

Problem

Research questions and friction points this paper is trying to address.

Natural Language Processing

Object Placement Prediction

Robotics Flexibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Robotics Sorting

Natural Language Instructions

🔎 Similar Papers

Cropper: Vision-Language Model for Image Cropping through In-Context Learning