Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the disconnect between existing language-driven 3D scene generation methods and user interaction, which limits system adaptability and immersion. The paper introduces the first closed-loop framework that unifies language-driven 3D generation with immersive interaction: it leverages a large language model to construct structured scene representations, employs reinforcement learning to optimize spatial layouts under geometric and semantic constraints, and continuously aligns generated content with human perception and task requirements through human-in-the-loop feedback in virtual reality. Evaluated on the ALFRED benchmark, the approach achieves state-of-the-art performance in task-oriented scene generation. User studies demonstrate significant improvements in immersion, interaction quality, and task efficiency compared to prior methods.
📝 Abstract
Recent advances in large language models (LLMs) have significantly improved language-driven 3D content generation, but most existing approaches still treat scene generation and user interaction as separate processes, limiting the adaptability and immersive potential of interactive multimedia systems. This paper presents a unified framework that closes the loop between language-driven 3D scene generation and immersive user interaction. Given natural language instructions, the system first constructs structured scene representations using LLMs, and then optimizes spatial layouts via reinforcement learning under geometric and semantic constraints. The generated environments are deployed in a virtual reality setting to facilitate HRI-in-the-loop, where user interactions provide continuous feedback to align generated content with human perception and usability. By tightly coupling generation and interaction, the proposed framework enables more responsive, adaptive, and realistic multimedia experiences. Experiments on the ALFRED benchmark demonstrate state-of-the-art performance in task-based scene generation. Furthermore, qualitative results and user studies show consistent improvements in immersion, interaction quality, and task efficiency, highlighting the importance of closed-loop integration of generation and interaction for next-generation multimedia systems. Our project page can be found at https://proj-showcase.github.io/h3ds/.
Problem

Research questions and friction points this paper is trying to address.

3D scene generation
immersive interaction
language-driven generation
interactive multimedia systems
user feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-RL coupling
closed-loop 3D generation
immersive interaction
HRI-in-the-loop
language-driven scene generation