NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding

๐Ÿ“… 2025-09-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper introduces a novel paradigm for constructing interactive 3D virtual worlds from a single image, addressing the core challenge of achieving photorealistic 3D rendering in user-explored regions while efficiently synthesizing non-interactive areas. Methodologically, it proposes a progressive 3D unfolding mechanism: foreground objects are modeled as full, object-centric 3D representations and rendered via differentiable rendering; the background is synthesized semantically coherently in 2D. Notably, it is the first method to support natural languageโ€“driven appearance editing of objects and physically plausible dynamic responses. The approach integrates representation learning, object-level 3D generation, and a hybrid 2Dโ€“3D scene architecture. Evaluated on the WorldScore benchmark, it significantly outperforms existing 2D and 2.5D methods, demonstrating superior visual fidelity, real-time interactive capability, and immersive, free-viewpoint navigation.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce NeoWorld, a deep learning framework for generating interactive 3D virtual worlds from a single input image. Inspired by the on-demand worldbuilding concept in the science fiction novel Simulacron-3 (1964), our system constructs expansive environments where only the regions actively explored by the user are rendered with high visual realism through object-centric 3D representations. Unlike previous approaches that rely on global world generation or 2D hallucination, NeoWorld models key foreground objects in full 3D, while synthesizing backgrounds and non-interacted regions in 2D to ensure efficiency. This hybrid scene structure, implemented with cutting-edge representation learning and object-to-3D techniques, enables flexible viewpoint manipulation and physically plausible scene animation, allowing users to control object appearance and dynamics using natural language commands. As users interact with the environment, the virtual world progressively unfolds with increasing 3D detail, delivering a dynamic, immersive, and visually coherent exploration experience. NeoWorld significantly outperforms existing 2D and depth-layered 2.5D methods on the WorldScore benchmark.
Problem

Research questions and friction points this paper is trying to address.

Generating interactive 3D worlds from single images
Progressively rendering explored regions with high realism
Enabling viewpoint manipulation and animation via natural language
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates 3D virtual worlds from single image
Uses hybrid 3D foreground and 2D background synthesis
Progressively unfolds 3D detail during user exploration
๐Ÿ”Ž Similar Papers
No similar papers found.
Yanpeng Zhao
Yanpeng Zhao
University of Edinburgh
Natural Language Understanding
S
Shanyan Guan
vivo Mobile Communication Co., Ltd.
Y
Yunbo Wang
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
Y
Yanhao Ge
vivo Mobile Communication Co., Ltd.
W
Wei Li
vivo Mobile Communication Co., Ltd.
X
Xiaokang Yang
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University