๐ค AI Summary
This work addresses the noise introduced when lifting inconsistent multi-view 2D features into 3D semantic Gaussians by proposing a lightweight and efficient neural regularization method. It pioneers the direct application of a variance-aware conditional MLP to 3D Gaussians, leveraging their geometric and appearance attributes to dynamically correct semantic errorsโwithout requiring complex preprocessing or incurring additional computational overhead. Experimental results demonstrate that the proposed approach significantly enhances both the accuracy and cross-view consistency of 3D semantic fields across multiple datasets, thereby yielding more robust semantic representations for downstream tasks.
๐ Abstract
We propose a neural regularization method that refines the noisy 3D semantic field produced by lifting multi-view inconsistent 2D features, in order to obtain an accurate and robust 3D semantic Gaussian Splatting. The 2D features extracted from vision foundation models suffer from multi-view inconsistency due to a lack of cross-view constraints. Lifting these inconsistent features directly into 3D Gaussians results in a noisy semantic field, which degrades the performance of downstream tasks. Previous methods either focus on obtaining consistent multi-view features in the preprocessing stage or aim to mitigate noise through improved optimization strategies, often at the cost of increased preprocessing time or expensive computational overhead. In contrast, we introduce a variance-aware conditional MLP that operates directly on the 3D Gaussians, leveraging their geometric and appearance attributes to correct semantic errors in 3D space. Experiments on different datasets show that our method enhances the accuracy of lifted semantics, providing an efficient and effective approach to robust 3D semantic Gaussian Splatting.