ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing VR immersive scene generation faces challenges in balancing geometric complexity—high-polygon modeling is computationally prohibitive, while Gaussian splatting lacks photorealistic fidelity and spatial coherence. Method: We propose a lightweight hierarchical representation framework: low-polygon geometric proxies (e.g., terrain, billboards) serve as structural scaffolds, with decoupled RGBA texture synthesis enabling high-fidelity appearance rendering. We introduce a novel proxy-guided texturing paradigm, integrating a semantic-grid-augmented vision-language model (VLM) to enable spatially aware, text-to-scene generation. The method jointly synthesizes alpha-textured geometry, dynamic effects, and ambient audio for real-time mobile VR rendering. Contribution/Results: Experiments demonstrate significant improvements over state-of-the-art Gaussian splatting and simplified mesh approaches in visual realism, spatial consistency, and rendering efficiency—achieving real-time performance on resource-constrained mobile VR platforms.

Technology Category

Application Category

📝 Abstract
Automatic creation of 3D scenes for immersive VR presence has been a significant research focus for decades. However, existing methods often rely on either high-poly mesh modeling with post-hoc simplification or massive 3D Gaussians, resulting in a complex pipeline or limited visual realism. In this paper, we demonstrate that such exhaustive modeling is unnecessary for achieving compelling immersive experience. We introduce ImmerseGen, a novel agent-guided framework for compact and photorealistic world modeling. ImmerseGen represents scenes as hierarchical compositions of lightweight geometric proxies, i.e., simplified terrain and billboard meshes, and generates photorealistic appearance by synthesizing RGBA textures onto these proxies. Specifically, we propose terrain-conditioned texturing for user-centric base world synthesis, and RGBA asset texturing for midground and foreground scenery.This reformulation offers several advantages: (i) it simplifies modeling by enabling agents to guide generative models in producing coherent textures that integrate seamlessly with the scene; (ii) it bypasses complex geometry creation and decimation by directly synthesizing photorealistic textures on proxies, preserving visual quality without degradation; (iii) it enables compact representations suitable for real-time rendering on mobile VR headsets. To automate scene creation from text prompts, we introduce VLM-based modeling agents enhanced with semantic grid-based analysis for improved spatial reasoning and accurate asset placement. ImmerseGen further enriches scenes with dynamic effects and ambient audio to support multisensory immersion. Experiments on scene generation and live VR showcases demonstrate that ImmerseGen achieves superior photorealism, spatial coherence and rendering efficiency compared to prior methods. Project webpage: https://immersegen.github.io.
Problem

Research questions and friction points this paper is trying to address.

Automates 3D scene creation for VR with photorealistic textures
Simplifies modeling using agent-guided hierarchical geometric proxies
Enables real-time rendering on mobile VR via compact representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-guided hierarchical lightweight geometric proxies
Terrain-conditioned and RGBA asset texturing
VLM-based modeling agents for spatial reasoning
🔎 Similar Papers
No similar papers found.