SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing language-guided 3D Gaussian Splatting (LGS) methods are evaluated on limited scenes and near-training-view 2D renderings, failing to reflect genuine 3D understanding. To address this, we introduce GaussianWorld—the first large-scale, language-aware 3D benchmark for LGS evaluation—comprising 1,060 diverse indoor and outdoor scenes. We propose a novel language-Gaussian splatting joint 3D spatial evaluation paradigm, systematically assessing three method categories: scene-optimized, optimization-free, and generalizable approaches. We release GaussianWorld-49K, a high-quality 3DGS dataset with rich language annotations. Our evaluation demonstrates that generalizable LGS methods achieve zero-shot cross-scene inference and significantly outperform prior methods on 3D semantic segmentation. Furthermore, we identify a critical bottleneck in current LGS methods: degraded 3D comprehension under distant viewpoints. All code, data, and evaluation tools are publicly released.

Technology Category

Application Category

📝 Abstract

3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) generalizable approach. However, most of them are evaluated only on rendered 2D views of a handful of scenes and viewpoints close to the training views, limiting ability and insight into holistic 3D understanding. To address this gap, we propose the first large-scale benchmark that systematically assesses these three groups of methods directly in 3D space, evaluating on 1060 scenes across three indoor datasets and one outdoor dataset. Benchmark results demonstrate a clear advantage of the generalizable paradigm, particularly in relaxing the scene-specific limitation, enabling fast feed-forward inference on novel scenes, and achieving superior segmentation performance. We further introduce GaussianWorld-49K a carefully curated 3DGS dataset comprising around 49K diverse indoor and outdoor scenes obtained from multiple sources, with which we demonstrate the generalizable approach could harness strong data priors. Our codes, benchmark, and datasets will be made public to accelerate research in generalizable 3DGS scene understanding.

Problem

Research questions and friction points this paper is trying to address.

Assessing Language Gaussian Splatting methods in 3D space

Addressing limitations in holistic 3D scene understanding

Introducing a large-scale benchmark and dataset for 3DGS

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale 3D benchmark for Language Gaussian Splatting

Generalizable approach for fast feed-forward inference

GaussianWorld-49K dataset with 49K diverse scenes

🔎 Similar Papers

3D Vision-Language Gaussian Splatting