CompoSE: Compositional Synthesis and Editing of 3D Shapes via Part-Aware Control

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

High-fidelity, part-level controllable 3D content generation and editing remain key challenges in computer graphics. To address this, this work proposes CompoSE, a method that leverages a diffusion Transformer architecture with alternating local-global attention to automatically infer semantic structure and symmetry from user-provided coarse part layouts—such as bounding boxes—without requiring part-level textual prompts. A novel layout-conditioning mechanism ensures strict alignment with input constraints. CompoSE enables fine-grained, context-aware editing operations, including part replacement, addition or removal, and style-preserving scaling. Experiments demonstrate that CompoSE significantly outperforms existing approaches in guided 3D synthesis, with both quantitative metrics and large language model–based evaluations confirming its superiority.

📝 Abstract

Creating and editing high-quality 3D content remains a central challenge in computer graphics. We address this challenge by introducing CompoSE, a novel method for Compositional Synthesis and Editing of 3D shapes via part-aware control. Our method takes as input a set of coarse geometric primitives (e.g., bounding boxes) that represent distinct object parts arranged in a particular spatial configuration, and synthesizes as output part-separated 3D objects that support localized granular (i.e., compositional) editing of individual parts. The key insight that enables our method is our use of a diffusion transformer architecture that alternates between processing each part locally and aggregating contextual information across parts globally, and features a novel conditioning technique that ensures strong adherence to the user's input. Importantly, our method learns to infer part semantics and symmetries directly from the user's coarse layout guidance, and does not require part-level text prompts. We demonstrate that our method enables powerful part-level editing capabilities, including context-aware substitution, addition, deletion, and style-preserving resizing operations. We show through extensive experiments that our method significantly outperforms existing approaches on guided synthesis, as measured by objective metrics and LLM-based evaluations.

Problem

Research questions and friction points this paper is trying to address.

3D shape synthesis

compositional editing

part-aware control

3D content creation

localized editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional synthesis

part-aware control

diffusion transformer