๐ค AI Summary
To address the challenges of high memory consumption, label-space congestion, and difficulty in fine-grained segmentation arising from high-dimensional features in 3D Gaussian Splatting (3D-GS) semantic segmentation, this paper proposes a binary encoding and progressive learning framework. Methodologically, it introduces (1) a binary-to-decimal mapping to compress categorical features, drastically reducing GPU memory footprint; (2) a layer-wise decoding scheme employing coarse-to-fine binary representations to mitigate label conflicts; and (3) multi-stage independent subtask training coupled with opacity-aware joint fine-tuning to decouple rendering fidelity and semantic segmentation optimization. Evaluated on multiple benchmarks, the method achieves state-of-the-art segmentation accuracy while reducing GPU memory usage by 42% and accelerating inference by 3.1รโdemonstrating a favorable trade-off between computational efficiency and fine-grained semantic expressiveness.
๐ Abstract
3D Gaussian Splatting (3D-GS) has emerged as an efficient 3D representation and a promising foundation for semantic tasks like segmentation. However, existing 3D-GS-based segmentation methods typically rely on high-dimensional category features, which introduce substantial memory overhead. Moreover, fine-grained segmentation remains challenging due to label space congestion and the lack of stable multi-granularity control mechanisms. To address these limitations, we propose a coarse-to-fine binary encoding scheme for per-Gaussian category representation, which compresses each feature into a single integer via the binary-to-decimal mapping, drastically reducing memory usage. We further design a progressive training strategy that decomposes panoptic segmentation into a series of independent sub-tasks, reducing inter-class conflicts and thereby enhancing fine-grained segmentation capability. Additionally, we fine-tune opacity during segmentation training to address the incompatibility between photometric rendering and semantic segmentation, which often leads to foreground-background confusion. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art segmentation performance while significantly reducing memory consumption and accelerating inference.