🤖 AI Summary
This work addresses the challenge in conventional 3D Gaussian Splatting (3DGS) representations, which struggle to efficiently encode both high-frequency details and low-frequency smooth regions while lacking structural awareness. The authors propose a structure-aware hybrid 3D Gaussian representation that, for the first time, integrates the artistic concepts of sketching and coloring into 3DGS. Without relying on external geometric priors, their method adaptively partitions Gaussians into Sketch Gaussians for edge delineation and Patch Gaussians for smooth regions through multi-criterion density clustering and quality-driven optimization. The framework supports semantic-driven Gaussian grouping, hierarchical progressive rendering, and efficient streaming. Experiments demonstrate significant improvements: at comparable model size, it achieves a 1.74 dB gain in PSNR, a 6.7% increase in SSIM, and a 41.4% reduction in LPIPS; notably, indoor scenes retain visual fidelity using only 0.5% of the original model size.
📝 Abstract
We observe that Gaussians exhibit distinct roles and characteristics analogous to traditional artistic techniques -- like how artists first sketch outlines before filling in broader areas with color, some Gaussians capture high-frequency features such as edges and contours, while others represent broader, smoother regions analogous to brush strokes that add volume and depth. Based on this observation, we propose a hybrid representation that categorizes Gaussians into (i) Sketch Gaussians, which represent high-frequency, boundary-defining features, and (ii) Patch Gaussians, which cover low-frequency, smooth regions. This semantic separation naturally enables layered progressive streaming, where the compact Sketch Gaussians establish the structural skeleton before Patch Gaussians incrementally refine volumetric detail. In this work, we extend our previous method to arbitrary 3D scenes by proposing a novel hierarchical adaptive categorization framework that operates directly on the 3DGS representation. Our approach employs multi-criteria density-based clustering, combined with adaptive quality-driven refinement. This method eliminates dependency on external 3D line primitives while ensuring optimal parametric encoding effectiveness. Our comprehensive evaluation across diverse scenes, including both man-made and natural environments, demonstrates that our method achieves up to 1.74 dB improvement in PSNR, 6.7% in SSIM, and 41.4% in LPIPS at equivalent model sizes compared to uniform pruning baselines. For indoor scenes, our method can maintain visual quality with only 0.5\% of the original model size. This structure-aware representation enables efficient storage, adaptive streaming, and rendering of high-fidelity 3D content across bandwidth-constrained networks and resource-limited devices.