ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This paper addresses text-driven rigid geometric shape rearrangement: given a set of fixed shapes and a natural language description, the goal is to generate non-overlapping, semantically consistent, and physically plausible vector compositions. Methodologically, we propose the first content-aware differentiable collision resolution mechanism, tightly coupling diffusion-based semantic guidance with explicit geometric constraints—including non-overlap and spatial relations—within an end-to-end differentiable vector generation pipeline. Key technical innovations include Score Distillation Sampling for semantic alignment, differentiable vector rendering, and semantic-aware overlap detection and correction. Experiments demonstrate that our approach significantly outperforms existing baselines across diverse text-to-shape matching tasks. Generated compositions are physically valid, exhibit well-defined spatial relationships, and faithfully realize linguistic semantics. Quantitative evaluations and visual assessments both confirm substantial improvements in accuracy, constraint satisfaction, and perceptual quality.

Technology Category

Application Category

📝 Abstract

While diffusion-based models excel at generating photorealistic images from text, a more nuanced challenge emerges when constrained to using only a fixed set of rigid shapes, akin to solving tangram puzzles or arranging real-world objects to match semantic descriptions. We formalize this problem as shape-based image generation, a new text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations and visually communicating the target concept. Unlike pixel-manipulation approaches, our method, ShapeShift, explicitly parameterizes each shape within a differentiable vector graphics pipeline, iteratively optimizing placement and orientation through score distillation sampling from pretrained diffusion models. To preserve arrangement clarity, we introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur, ensuring smooth convergence toward physically valid configurations. By bridging diffusion-based semantic guidance with explicit geometric constraints, our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt. Extensive experiments demonstrate compelling results across diverse scenarios, with quantitative and qualitative advantages over alternative techniques.

Problem

Research questions and friction points this paper is trying to address.

Generate images using fixed rigid shapes from text

Optimize shape placement and orientation via diffusion models

Ensure non-overlapping, semantically coherent shape arrangements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable vector graphics pipeline optimization

Content-aware collision resolution mechanism

Score distillation from pretrained diffusion models

🔎 Similar Papers

SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements