NuiWorld: Exploring a Scalable Framework for End-to-End Controllable World Generation

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes NuiWorld, a novel framework addressing key limitations of existing end-to-end world generation methods—namely data scarcity, fixed-resolution representations, and high inference overhead. NuiWorld leverages generative bootstrapping to synthesize diverse scene data from a small set of images and incorporates pseudo-sketch labels to enable layout-controllable generation. It introduces a flattened, variable-size vector set representation that supports efficient and consistent multi-scale scene modeling. The framework substantially improves both training and inference efficiency while demonstrating strong generalization to unseen sketches. NuiWorld generates virtual worlds with high geometric fidelity and precise layout control, advancing the state of the art in controllable 3D scene synthesis.

Technology Category

Application Category

📝 Abstract

World generation is a fundamental capability for applications like video games, simulation, and robotics. However, existing approaches face three main obstacles: controllability, scalability, and efficiency. End-to-end scene generation models have been limited by data scarcity. While object-centric generation approaches rely on fixed resolution representations, degrading fidelity for larger scenes. Training-free approaches, while flexible, are often slow and computationally expensive at inference time. We present NuiWorld, a framework that attempts to address these challenges. To overcome data scarcity, we propose a generative bootstrapping strategy that starts from a few input images. Leveraging recent 3D reconstruction and expandable scene generation techniques, we synthesize scenes of varying sizes and layouts, producing enough data to train an end-to-end model. Furthermore, our framework enables controllability through pseudo sketch labels, and demonstrates a degree of generalization to previously unseen sketches. Our approach represents scenes as a collection of variable scene chunks, which are compressed into a flattened vector-set representation. This significantly reduces the token length for large scenes, enabling consistent geometric fidelity across scenes sizes while improving training and inference efficiency.

Problem

Research questions and friction points this paper is trying to address.

controllability

scalability

efficiency

data scarcity

scene generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative bootstrapping

scene chunks

vector-set representation

controllable generation

scalable world generation

🔎 Similar Papers

Craftium: An Extensible Framework for Creating Reinforcement Learning Environments

2024-07-04arXiv.orgCitations: 0

Authors to Follow