🤖 AI Summary
To address the inconsistency between global mask regularization and local pixel-wise reconstruction loss in 3D Gaussian Splatting (3DGS), this paper proposes SVR-GS: a spatially variant regularization framework that dynamically generates per-pixel, view-directional Gaussian contribution masks to impose differentiated sparsity constraints in low-importance regions. By aligning mask-based regularization with per-ray reconstruction error, SVR-GS significantly improves sparsification accuracy. We design three mask aggregation strategies and implement them efficiently via CUDA, with gradient analysis guiding the final architectural optimization. Evaluated on Tanks&Temples and two additional benchmarks, SVR-GS reduces the number of Gaussians by 1.79× compared to MaskGS and by 5.63× compared to vanilla 3DGS, while incurring only marginal PSNR degradation (−0.50 dB / −0.40 dB). This yields substantially enhanced model compactness and inference efficiency without compromising visual fidelity.
📝 Abstract
3D Gaussian Splatting (3DGS) enables fast, high-quality novel view synthesis but typically relies on densification followed by pruning to optimize the number of Gaussians. Existing mask-based pruning, such as MaskGS, regularizes the global mean of the mask, which is misaligned with the local per-pixel (per-ray) reconstruction loss that determines image quality along individual camera rays. This paper introduces SVR-GS, a spatially variant regularizer that renders a per-pixel spatial mask from each Gaussian's effective contribution along the ray, thereby applying sparsity pressure where it matters: on low-importance Gaussians. We explore three spatial-mask aggregation strategies, implement them in CUDA, and conduct a gradient analysis to motivate our final design. Extensive experiments on Tanks&Temples, Deep Blending, and Mip-NeRF360 datasets demonstrate that, on average across the three datasets, the proposed SVR-GS reduces the number of Gaussians by 1.79( imes) compared to MaskGS and 5.63( imes) compared to 3DGS, while incurring only 0.50 dB and 0.40 dB PSNR drops, respectively. These gains translate into significantly smaller, faster, and more memory-efficient models, making them well-suited for real-time applications such as robotics, AR/VR, and mobile perception.