CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current Gaussian splatting methods exhibit a disconnect between semantic understanding and geometric modeling: semantic-driven approaches lack explicit 3D structure, while geometry-driven methods suffer from weak semantic abstraction. To bridge this gap, we propose Structured Gaussian Lattice—a novel compact representation that unifies multimodal semantic features (CLIP, DINOv2, SEEM) with explicit voxelized geometric anchors. Our method introduces a dynamic anchor growth-and-pruning mechanism coupled with a feature-aware importance scoring strategy, enabling efficient sparsification while preserving semantic fidelity. Evaluated on standard benchmarks, our model achieves state-of-the-art performance with only 6M parameters—reducing model size by 10× over the second-best approach—while maintaining high geometric accuracy, significantly improving semantic consistency, and accelerating inference.

Technology Category

Application Category

📝 Abstract
Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting representation, which connects multimodal semantic features with structured 3D geometry. Specifically, we design a voxelized anchor structure that constructs a spatial scaffold, while extracting multimodal semantic features from a set of foundation models (e.g., CLIP, DINOv2, SEEM). Moreover, we introduce a multimodal latent feature allocation mechanism to unify appearance, geometry, and semantics across heterogeneous feature spaces, ensuring a consistent representation across multiple foundation models. Finally, we propose a feature-aware significance evaluation strategy to dynamically guide anchor growing and pruning, effectively removing redundant or invalid anchors while maintaining semantic integrity. Extensive experiments show that CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters - an order of magnitude smaller than the closest rival at 35M - highlighting the excellent trade off between performance and model efficiency of the proposed framework.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic understanding and 3D geometry modeling gaps
Unifying multimodal semantic features with structured 3D geometry
Achieving compact representation while maintaining competitive performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voxelized anchor structure constructs spatial scaffold
Multimodal latent feature allocation unifies heterogeneous features
Feature-aware significance evaluation dynamically guides anchor pruning
🔎 Similar Papers
No similar papers found.
Yuhang Ming
Yuhang Ming
Lecturer at Hangzhou Dianzi University
SLAMVPRComputer VisionRoboticsSpatial AI
C
Chenxin Fang
School of Computer Science, Hangzhou Dianzi University
X
Xingyuan Yu
CAD & CG, Zhejiang University
F
Fan Zhang
School of Computer Science, University of Bristol
Weichen Dai
Weichen Dai
Hangzhou Dianzi University
3D VisionSLAMBrain-inspired intelligence
W
Wanzeng Kong
School of Computer Science, Hangzhou Dianzi University
G
Guofeng Zhang
CAD & CG, Zhejiang University