Map2World: Segment Map Conditioned Text to 3D World Generation

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing methods for 3D world generation are constrained by regular grid layouts, which struggle to ensure global scale consistency and user controllability. This work proposes Map2World, a framework that enables end-to-end 3D world synthesis from semantic segmentation maps of arbitrary shape and scale—the first approach to achieve this capability. By integrating a conditional generative architecture, a detail enhancement network, and a strong-prior asset generator, Map2World preserves global structural coherence while injecting fine-grained geometric details. Experimental results demonstrate that Map2World significantly outperforms existing methods in user controllability, scale consistency, and content coherence, enabling high-quality 3D scene generation under complex semantic conditions.

📝 Abstract

3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.

Problem

Research questions and friction points this paper is trying to address.

3D world generation

scale consistency

user controllability

scene coherence

segment map conditioning

Innovation

Methods, ideas, or system contributions that make the work stand out.

segment map conditioning

3D world generation

detail enhancer network