🤖 AI Summary
This work addresses the inconsistency in multi-view feature lifting—e.g., DINO and CLIP features—within splat-based 3D representations. We propose a unified sparse linear inverse modeling framework, the first to formulate feature lifting as an analytically solvable linear inverse problem. Our approach incorporates Tikhonov regularization and post-lifting aggregation to ensure numerical stability and semantic fidelity, while soft diagonal dominance constraints and feature-clustering filtering enable efficient closed-form solutions. The method is kernel- and feature-structure-agnostic, ensuring strong generalizability. Evaluated on open-vocabulary 3D segmentation, it achieves state-of-the-art performance, significantly outperforming learned, grouped, and heuristic forward-lifting baselines. Processing a single scene requires only several minutes.
📝 Abstract
Feature lifting has emerged as a crucial component in 3D scene understanding, enabling the attachment of rich image feature descriptors (e.g., DINO, CLIP) onto splat-based 3D representations. The core challenge lies in optimally assigning rich general attributes to 3D primitives while addressing the inconsistency issues from multi-view images. We present a unified, kernel- and feature-agnostic formulation of the feature lifting problem as a sparse linear inverse problem, which can be solved efficiently in closed form. Our approach admits a provable upper bound on the global optimal error under convex losses for delivering high quality lifted features. To address inconsistencies and noise in multi-view observations, we introduce two complementary regularization strategies to stabilize the solution and enhance semantic fidelity. Tikhonov Guidance enforces numerical stability through soft diagonal dominance, while Post-Lifting Aggregation filters noisy inputs via feature clustering. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on open-vocabulary 3D segmentation benchmarks, outperforming training-based, grouping-based, and heuristic-forward baselines while producing the lifted features in minutes. Code is available at href{https://github.com/saliteta/splat-distiller.git}{ extbf{github}}. We also have a href{https://splat-distiller.pages.dev/}