PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces the novel task of language-guided 3D object placement: given a real-world scene point cloud, a 3D asset to be placed, and a natural language prompt, the goal is to predict a semantically plausible and geometrically feasible 6-DoF pose—ensuring support, collision avoidance, and efficient free-space utilization. It is the first work to systematically address the challenges of solution multiplicity under high ambiguity, cross-modal geometric–linguistic alignment, and free-space reasoning. To this end, we establish the first dedicated benchmark—including a curated dataset, standardized evaluation protocol, and baseline methods—and propose an end-to-end trainable 3D large model framework. Our approach integrates multimodal feature alignment, explicit 3D spatial relation modeling, and language-guided pose optimization. Experiments demonstrate substantial improvements over heuristic baselines, establishing a foundational benchmark for evaluating localization capabilities in general-purpose 3D foundation models.

Technology Category

Application Category

📝 Abstract
We introduce the novel task of Language-Guided Object Placement in Real 3D Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual prompt broadly describing where the 3D asset should be placed. The task here is to find a valid placement for the 3D asset that respects the prompt. Compared with other language-guided localization tasks in 3D scenes such as grounding, this task has specific challenges: it is ambiguous because it has multiple valid solutions, and it requires reasoning about 3D geometric relationships and free space. We inaugurate this task by proposing a new benchmark and evaluation protocol. We also introduce a new dataset for training 3D LLMs on this task, as well as the first method to serve as a non-trivial baseline. We believe that this challenging task and our new benchmark could become part of the suite of benchmarks used to evaluate and compare generalist 3D LLM models.
Problem

Research questions and friction points this paper is trying to address.

Language-guided 3D object placement in real scenes
Resolving ambiguous multiple valid placement solutions
Reasoning 3D geometric relationships and free space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-guided 3D object placement in scenes
Novel benchmark for 3D LLM evaluation
Dataset for training 3D geometric reasoning
🔎 Similar Papers
No similar papers found.