SceneFoundry: Generating Interactive Infinite 3D Worlds

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D scene generation methods struggle to construct large-scale indoor environments that simultaneously incorporate functional articulated components, semantic diversity, and robot-interactivity. This work proposes a language-guided diffusion framework that, for the first time, integrates large language models to control global layout and diffusion models to generate jointed, movable furniture. A differentiable guidance mechanism is introduced to enforce physical plausibility and interactive functionality. Leveraging a large-scale 3D asset repository, the method generates apartment-scale scenes from natural language instructions, producing structurally coherent, semantically consistent, and functionally interactive environments. This approach significantly enhances the scalability and realism of simulated environments for embodied intelligence research.

Technology Category

Application Category

📝 Abstract
The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existing generative approaches often fail to capture the functional complexity of real-world interiors, particularly those containing articulated objects with movable parts essential for manipulation and navigation. This paper presents SceneFoundry, a language-guided diffusion framework that generates apartment-scale 3D worlds with functionally articulated furniture and semantically diverse layouts for robotic training. From natural language prompts, an LLM module controls floor layout generation, while diffusion-based posterior sampling efficiently populates the scene with articulated assets from large-scale 3D repositories. To ensure physical usability, SceneFoundry employs differentiable guidance functions to regulate object quantity, prevent articulation collisions, and maintain sufficient walkable space for robotic navigation. Extensive experiments demonstrate that our framework generates structurally valid, semantically coherent, and functionally interactive environments across diverse scene types and conditions, enabling scalable embodied AI research. project page: https://anc891203.github.io/SceneFoundry-Demo/
Problem

Research questions and friction points this paper is trying to address.

interactive 3D environments
functional complexity
articulated objects
robotic learning
embodied intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-guided diffusion
articulated 3D assets
differentiable guidance
embodied AI
interactive 3D generation
🔎 Similar Papers
No similar papers found.
C
ChunTeng Chen
National Yang Ming Chiao Tung University
Y
YiChen Hsu
National Tsing Hua University
Y
YiWen Liu
National Yang Ming Chiao Tung University
W
WeiFang Sun
NVIDIA AI Technology Center
T
TsaiChing Ni
National Yang Ming Chiao Tung University
C
ChunYi Lee
National Taiwan University
Min Sun
Min Sun
Associate Professor at National Tsing Hua University; Principal Applied Scientist at Amazon
computer visionmachine learningdeep learningand AI
Y
YuanFu Yang
National Yang Ming Chiao Tung University