PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-driven 3D scene generation methods struggle to produce semantically rich and photorealistic scenes due to the scarcity of aligned 3D-text data and multi-view inconsistencies. To address these limitations, this work proposes PSGS, a two-stage framework that first generates semantically coherent panoramic images through layout reasoning and a self-refinement module, then constructs a globally consistent 3D Gaussian splatting point cloud via a panoramic sliding mechanism. The approach innovatively integrates structured spatial relationship parsing, iterative feedback from multimodal large language models (MLLMs), and a panoramic sliding sampling strategy, complemented by depth and semantic consistency losses to enhance both semantic coherence and fine-grained fidelity. Experimental results demonstrate that PSGS outperforms current methods in generation quality and realism, offering a scalable solution for high-fidelity content creation in immersive applications such as VR and AR.

Technology Category

Application Category

📝 Abstract
Generating realistic 3D scenes from text is crucial for immersive applications like VR, AR, and gaming. While text-driven approaches promise efficiency, existing methods suffer from limited 3D-text data and inconsistent multi-view stitching, resulting in overly simplistic scenes. To address this, we propose PSGS, a two-stage framework for high-fidelity panoramic scene generation. First, a novel two-layer optimization architecture generates semantically coherent panoramas: a layout reasoning layer parses text into structured spatial relationships, while a self-optimization layer refines visual details via iterative MLLM feedback. Second, our panorama sliding mechanism initializes globally consistent 3D Gaussian Splatting point clouds by strategically sampling overlapping perspectives. By incorporating depth and semantic coherence losses during training, we greatly improve the quality and detail fidelity of rendered scenes. Our experiments demonstrate that PSGS outperforms existing methods in panorama generation and produces more appealing 3D scenes, offering a robust solution for scalable immersive content creation.
Problem

Research questions and friction points this paper is trying to address.

text-driven 3D generation
panoramic scene generation
multi-view consistency
3D-text data scarcity
immersive content creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
text-to-3D
panorama generation
multi-view consistency
semantic layout reasoning
🔎 Similar Papers
No similar papers found.
X
Xin Zhang
East China University of Science and Technology, Shanghai, China
S
Shen Chen
Zhejiang University, Hangzhou, China
Jiale Zhou
Jiale Zhou
MDH
requirements engineeringsafety critical systemshazard analysisontology
L
Lei Li
Beijing Institute of Technology, Beijing, China