CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Current Gaussian splatting methods exhibit a disconnect between semantic understanding and geometric modeling: semantic-driven approaches lack explicit 3D structure, while geometry-driven methods suffer from weak semantic abstraction. To bridge this gap, we propose Structured Gaussian Lattice—a novel compact representation that unifies multimodal semantic features (CLIP, DINOv2, SEEM) with explicit voxelized geometric anchors. Our method introduces a dynamic anchor growth-and-pruning mechanism coupled with a feature-aware importance scoring strategy, enabling efficient sparsification while preserving semantic fidelity. Evaluated on standard benchmarks, our model achieves state-of-the-art performance with only 6M parameters—reducing model size by 10× over the second-best approach—while maintaining high geometric accuracy, significantly improving semantic consistency, and accelerating inference.

Technology Category

Application Category

📝 Abstract

Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting representation, which connects multimodal semantic features with structured 3D geometry. Specifically, we design a voxelized anchor structure that constructs a spatial scaffold, while extracting multimodal semantic features from a set of foundation models (e.g., CLIP, DINOv2, SEEM). Moreover, we introduce a multimodal latent feature allocation mechanism to unify appearance, geometry, and semantics across heterogeneous feature spaces, ensuring a consistent representation across multiple foundation models. Finally, we propose a feature-aware significance evaluation strategy to dynamically guide anchor growing and pruning, effectively removing redundant or invalid anchors while maintaining semantic integrity. Extensive experiments show that CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters - an order of magnitude smaller than the closest rival at 35M - highlighting the excellent trade off between performance and model efficiency of the proposed framework.

Problem

Research questions and friction points this paper is trying to address.

Bridging semantic understanding and 3D geometry modeling gaps

Unifying multimodal semantic features with structured 3D geometry

Achieving compact representation while maintaining competitive performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Voxelized anchor structure constructs spatial scaffold

Multimodal latent feature allocation unifies heterogeneous features

Feature-aware significance evaluation dynamically guides anchor pruning

🔎 Similar Papers

No similar papers found.