Controllable 3D Placement of Objects with Scene-Aware Diffusion Models

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of precise object placement and controllable editing in 3D scenes. Methodologically, it introduces a scene-aware diffusion model framework featuring depth/normal-guided visual conditioning and coarse-grained mask-driven local generation to decouple object editing from background preservation; cross-modal text–vision alignment enables joint control over position, pose, and non-rigid deformation. Key contributions include: (i) the first lightweight, geometry-aware visual conditioning signal explicitly designed for 3D object placement—requiring neither fine-grained masks nor complex prompt engineering; and (ii) a multi-dimensional evaluation benchmark for editing quality, specifically constructed for automotive scenes. Experiments demonstrate substantial improvements over baselines: 42% reduction in positional error and 37% reduction in pose error, alongside significant gains in geometric plausibility and background consistency.

Technology Category

Application Category

📝 Abstract
Image editing approaches have become more powerful and flexible with the advent of powerful text-conditioned generative models. However, placing objects in an environment with a precise location and orientation still remains a challenge, as this typically requires carefully crafted inpainting masks or prompts. In this work, we show that a carefully designed visual map, combined with coarse object masks, is sufficient for high quality object placement. We design a conditioning signal that resolves ambiguities, while being flexible enough to allow for changing of shapes or object orientations. By building on an inpainting model, we leave the background intact by design, in contrast to methods that model objects and background jointly. We demonstrate the effectiveness of our method in the automotive setting, where we compare different conditioning signals in novel object placement tasks. These tasks are designed to measure edit quality not only in terms of appearance, but also in terms of pose and location accuracy, including cases that require non-trivial shape changes. Lastly, we show that fine location control can be combined with appearance control to place existing objects in precise locations in a scene.
Problem

Research questions and friction points this paper is trying to address.

Precise 3D object placement in scenes using visual maps
Resolving ambiguities in object shape and orientation control
Maintaining background integrity while editing object locations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene-aware diffusion models for 3D placement
Visual map and coarse masks conditioning
Inpainting model preserves background integrity
🔎 Similar Papers
No similar papers found.
Mohamed Omran
Mohamed Omran
Max Planck Institute for Informatics
Computer VisionMachine Learning
D
Dimitris Kalatzis
Qualcomm AI Research
J
Jens Petersen
Qualcomm AI Research
A
Amirhossein Habibian
Qualcomm AI Research
Auke Wiggers
Auke Wiggers
Qualcomm AI Research
Generative modelsNeural compression