SceneFoundry: Generating Interactive Infinite 3D Worlds

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing 3D scene generation methods struggle to construct large-scale indoor environments that simultaneously incorporate functional articulated components, semantic diversity, and robot-interactivity. This work proposes a language-guided diffusion framework that, for the first time, integrates large language models to control global layout and diffusion models to generate jointed, movable furniture. A differentiable guidance mechanism is introduced to enforce physical plausibility and interactive functionality. Leveraging a large-scale 3D asset repository, the method generates apartment-scale scenes from natural language instructions, producing structurally coherent, semantically consistent, and functionally interactive environments. This approach significantly enhances the scalability and realism of simulated environments for embodied intelligence research.

Technology Category

Application Category

📝 Abstract

The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existing generative approaches often fail to capture the functional complexity of real-world interiors, particularly those containing articulated objects with movable parts essential for manipulation and navigation. This paper presents SceneFoundry, a language-guided diffusion framework that generates apartment-scale 3D worlds with functionally articulated furniture and semantically diverse layouts for robotic training. From natural language prompts, an LLM module controls floor layout generation, while diffusion-based posterior sampling efficiently populates the scene with articulated assets from large-scale 3D repositories. To ensure physical usability, SceneFoundry employs differentiable guidance functions to regulate object quantity, prevent articulation collisions, and maintain sufficient walkable space for robotic navigation. Extensive experiments demonstrate that our framework generates structurally valid, semantically coherent, and functionally interactive environments across diverse scene types and conditions, enabling scalable embodied AI research. project page: https://anc891203.github.io/SceneFoundry-Demo/

Problem

Research questions and friction points this paper is trying to address.

interactive 3D environments

functional complexity

articulated objects

robotic learning

embodied intelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

language-guided diffusion

articulated 3D assets

differentiable guidance