TRELLISWorld: Training-Free World Generation from Object Generators

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-3D scene generation methods are largely restricted to single-object synthesis, rely on domain-specific training data, or fail to ensure 360° view consistency. To address these limitations, we propose the first training-free, modular 3D scene synthesis framework. Our approach treats pre-trained text-to-3D object diffusion models as composable generative units and synthesizes large-scale, omnidirectional scenes via overlapping multi-patch sampling, semantics-aware collaborative denoising, and weighted fusion. Crucially, it requires no scene-level training data or model fine-tuning. The framework enables flexible layout editing and localized control while preserving semantic coherence across viewpoints. Quantitative and qualitative evaluations demonstrate significant improvements in visual fidelity and full-view consistency compared to prior arts. This work establishes a scalable, efficient paradigm for open-domain 3D content generation—bridging compositional reasoning with diffusion-based synthesis without retraining.

Technology Category

Application Category

📝 Abstract
Text-driven 3D scene generation holds promise for a wide range of applications, from virtual prototyping to AR/VR and simulation. However, existing methods are often constrained to single-object generation, require domain-specific training, or lack support for full 360-degree viewability. In this work, we present a training-free approach to 3D scene synthesis by repurposing general-purpose text-to-3D object diffusion models as modular tile generators. We reformulate scene generation as a multi-tile denoising problem, where overlapping 3D regions are independently generated and seamlessly blended via weighted averaging. This enables scalable synthesis of large, coherent scenes while preserving local semantic control. Our method eliminates the need for scene-level datasets or retraining, relies on minimal heuristics, and inherits the generalization capabilities of object-level priors. We demonstrate that our approach supports diverse scene layouts, efficient generation, and flexible editing, establishing a simple yet powerful foundation for general-purpose, language-driven 3D scene construction.
Problem

Research questions and friction points this paper is trying to address.

Generates 3D scenes without training using object diffusion models
Solves multi-tile blending for coherent 360-degree scene synthesis
Enables scalable semantic control without scene-level datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free 3D scene synthesis using object diffusion models
Multi-tile denoising with overlapping region blending
Scalable generation preserving local semantic control
🔎 Similar Papers
No similar papers found.
H
Hanke Chen
Carnegie Mellon University
Y
Yuan Liu
The Hong Kong University of Science and Technology
Minchen Li
Minchen Li
CMU, Genesis AI
Computer GraphicsVisual ComputingRoboticsComputational Mechanics