Splat Feature Solver

📅 2025-08-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inconsistency in multi-view feature lifting—e.g., DINO and CLIP features—within splat-based 3D representations. We propose a unified sparse linear inverse modeling framework, the first to formulate feature lifting as an analytically solvable linear inverse problem. Our approach incorporates Tikhonov regularization and post-lifting aggregation to ensure numerical stability and semantic fidelity, while soft diagonal dominance constraints and feature-clustering filtering enable efficient closed-form solutions. The method is kernel- and feature-structure-agnostic, ensuring strong generalizability. Evaluated on open-vocabulary 3D segmentation, it achieves state-of-the-art performance, significantly outperforming learned, grouped, and heuristic forward-lifting baselines. Processing a single scene requires only several minutes.

Technology Category

Application Category

📝 Abstract
Feature lifting has emerged as a crucial component in 3D scene understanding, enabling the attachment of rich image feature descriptors (e.g., DINO, CLIP) onto splat-based 3D representations. The core challenge lies in optimally assigning rich general attributes to 3D primitives while addressing the inconsistency issues from multi-view images. We present a unified, kernel- and feature-agnostic formulation of the feature lifting problem as a sparse linear inverse problem, which can be solved efficiently in closed form. Our approach admits a provable upper bound on the global optimal error under convex losses for delivering high quality lifted features. To address inconsistencies and noise in multi-view observations, we introduce two complementary regularization strategies to stabilize the solution and enhance semantic fidelity. Tikhonov Guidance enforces numerical stability through soft diagonal dominance, while Post-Lifting Aggregation filters noisy inputs via feature clustering. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on open-vocabulary 3D segmentation benchmarks, outperforming training-based, grouping-based, and heuristic-forward baselines while producing the lifted features in minutes. Code is available at href{https://github.com/saliteta/splat-distiller.git}{ extbf{github}}. We also have a href{https://splat-distiller.pages.dev/}
Problem

Research questions and friction points this paper is trying to address.

Optimally assigning rich image features to 3D primitives
Addressing inconsistency issues from multi-view images
Solving feature lifting as sparse linear inverse problem
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse linear inverse problem formulation
Tikhonov Guidance for numerical stability
Post-Lifting Aggregation via feature clustering
🔎 Similar Papers
No similar papers found.