GOPLA: Generalizable Object Placement Learning via Synthetic Augmentation of Human Arrangement

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

General object placement for home-service robots must satisfy both semantic plausibility (e.g., commonsense object relationships) and geometric feasibility (e.g., collision-free positioning and kinematic reachability). This paper proposes a hierarchical framework integrating: (i) a multimodal large language model for instruction and scene understanding; (ii) a spatial mapper and 3D reachability graph to encode physical constraints; (iii) a diffusion model for initial pose generation; and (iv) collision-aware test-time optimization to enhance physical robustness. We further introduce a scalable synthetic data augmentation pipeline. Evaluated on real robotic platforms, our method achieves strong cross-scene generalization, improving localization accuracy and physical plausibility by 30.04 percentage points over the strongest baseline. The approach significantly advances autonomous household organization capabilities in complex, unstructured domestic environments.

Technology Category

Application Category

📝 Abstract

Robots are expected to serve as intelligent assistants, helping humans with everyday household organization. A central challenge in this setting is the task of object placement, which requires reasoning about both semantic preferences (e.g., common-sense object relations) and geometric feasibility (e.g., collision avoidance). We present GOPLA, a hierarchical framework that learns generalizable object placement from augmented human demonstrations. A multi-modal large language model translates human instructions and visual inputs into structured plans that specify pairwise object relationships. These plans are then converted into 3D affordance maps with geometric common sense by a spatial mapper, while a diffusion-based planner generates placement poses guided by test-time costs, considering multi-plan distributions and collision avoidance. To overcome data scarcity, we introduce a scalable pipeline that expands human placement demonstrations into diverse synthetic training data. Extensive experiments show that our approach improves placement success rates by 30.04 percentage points over the runner-up, evaluated on positioning accuracy and physical plausibility, demonstrating strong generalization across a wide range of real-world robotic placement scenarios.

Problem

Research questions and friction points this paper is trying to address.

Learning generalizable object placement from augmented human demonstrations

Reasoning about semantic preferences and geometric feasibility simultaneously

Overcoming data scarcity through synthetic augmentation of human arrangements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical framework learns object placement from augmented demonstrations

Multi-modal LLM translates instructions into structured object relationship plans

Scalable pipeline expands human demonstrations into diverse synthetic training data

🔎 Similar Papers

Stimulating Imagination: Towards General-purpose Object Rearrangement