Generative Blocks World: Moving Things Around in Pictures

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the challenge of interactive geometric editing in generated images. We propose a variable-granularity scene representation based on convex 3D primitives, enabling users to manipulate simple geometric elements (e.g., cubes, prisms) to modify scene structure. Integrating depth estimation with geometry-aware texture prompting, our method drives diffusion-based image generation toward geometrically consistent reconstruction. By unifying differentiable rendering and 3D assembly, the framework supports flexible editing—from global layout to local details. Compared to prior approaches, our method achieves significant improvements in visual fidelity, editing controllability, and compositional generalization. Notably, it excels in preserving object identity, maintaining material consistency, and accurately modeling camera and object motion. (149 words)

Technology Category

Application Category

📝 Abstract

We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives, and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is generated by a flow-based method which is conditioned on depth and a texture hint. Our texture hint takes into account the modified 3D primitives, exceeding texture-consistency provided by existing key-value caching techniques. These texture hints (a) allow accurate object and camera moves and (b) largely preserve the identity of objects depicted. Quantitative and qualitative experiments demonstrate that our approach outperforms prior works in visual fidelity, editability, and compositional generalization.

Problem

Research questions and friction points this paper is trying to address.

Manipulating geometric abstractions in generated images

Representing scenes with variable 3D primitive assemblies

Enhancing texture consistency for accurate object moves

Innovation

Methods, ideas, or system contributions that make the work stand out.

Represents scenes as convex 3D primitives assemblies

Uses flow-based method conditioned on depth

Texture hints enhance consistency and identity

🔎 Similar Papers

No similar papers found.