BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D scene generation methods suffer from high storage overhead, structural distortion, and insufficient cross-modal collaborative modeling, hindering simultaneous achievement of photorealism, geometric fidelity, and model lightweightness. To address these challenges, we propose a cross-modal progressive 3D scene generation framework tailored for virtual reality. Our method introduces a novel hierarchical deep prior regularization mechanism to enforce geometric consistency; designs a structured contextual guidance-based hash grid compression scheme for efficient scene representation; and integrates 3D Gaussian splatting with incremental point cloud reconstruction, unified via cross-modal feature alignment to jointly process text and image inputs. Experiments demonstrate that our approach significantly outperforms baselines across diverse scenes: generated scenes exhibit coherent structure and high geometric accuracy, while the model size is reduced by over 60%, substantially lowering memory and storage requirements.

Technology Category

Application Category

📝 Abstract
With the widespread use of virtual reality applications, 3D scene generation has become a new challenging research frontier. 3D scenes have highly complex structures and need to ensure that the output is dense, coherent, and contains all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators. However, the generated scenes occupy large amounts of storage space and often lack effective regularisation methods, leading to geometric distortions. To this end, we propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation, which creates diverse and high-quality 3D scenes from text or image inputs. Specifically, a crossmodal progressive scene generation framework is proposed to generate coherent scenes utilizing incremental point cloud reconstruction and 3D Gaussian splatting. Additionally, we propose a hierarchical depth prior-based regularization mechanism that utilizes multi-level constraints on depth accuracy and smoothness to enhance the realism and continuity of the generated scenes. Ultimately, we propose a structured context-guided compression mechanism that exploits structured hash grids to model the context of unorganized anchor attributes, which significantly eliminates structural redundancy and reduces storage overhead. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.
Problem

Research questions and friction points this paper is trying to address.

3D scene generation
storage efficiency
realism adjustment
Innovation

Methods, ideas, or system contributions that make the work stand out.

BloomScene
3D scene generation
compression technology
🔎 Similar Papers
No similar papers found.
Xiaolu Hou
Xiaolu Hou
Faculty of Informatics and Information Technologies, Slovak University of Technology, Slovakia
Cryptography Hardware SecurityAI Security
Mingcheng Li
Mingcheng Li
Fudan University
Dingkang Yang
Dingkang Yang
ByteDance
Multimodal LearningGenerative AIEmbodied AI
J
Jiawei Chen
Academy for Engineering and Technology, Fudan University
Z
Ziyun Qian
Academy for Engineering and Technology, Fudan University
X
Xiao Zhao
Academy for Engineering and Technology, Fudan University
Y
Yue Jiang
Academy for Engineering and Technology, Fudan University
Jinjie Wei
Jinjie Wei
Fudan University
Large Language Model
Q
Qingyao Xu
Academy for Engineering and Technology, Fudan University
Lihua Zhang
Lihua Zhang
Wuhan University
computational biologybioinformaticsdata mining