Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenge of learning structured 3D features from single-view 2D image supervision. We propose the first feature distillation-based framework for multi-field decoupled implicit representation. Our method explicitly decomposes the 3D feature field into two orthogonal components: view-invariant (geometric/semantic) and view-dependent (e.g., reflectance), enforced by structural decoupling constraints and 2D–3D cross-dimensional consistency regularization. Leveraging only pretrained 2D vision features—without any 3D annotations—we enable end-to-end optimization. Unlike conventional monolithic volumetric representations, our approach supports fine-grained, interactive editing of semantic and physical attributes (e.g., specular reflection removal). It achieves state-of-the-art performance on 3D segmentation and, for the first time, enables single-image-driven interactive 3D segmentation, attribute editing, and physically controllable effect removal.

Technology Category

Application Category

📝 Abstract

Recent work has demonstrated the ability to leverage or distill pre-trained 2D features obtained using large pre-trained 2D models into 3D features, enabling impressive 3D editing and understanding capabilities using only 2D supervision. Although impressive, models assume that 3D features are captured using a single feature field and often make a simplifying assumption that features are view-independent. In this work, we propose instead to capture 3D features using multiple disentangled feature fields that capture different structural components of 3D features involving view-dependent and view-independent components, which can be learned from 2D feature supervision only. Subsequently, each element can be controlled in isolation, enabling semantic and structural understanding and editing capabilities. For instance, using a user click, one can segment 3D features corresponding to a given object and then segment, edit, or remove their view-dependent (reflective) properties. We evaluate our approach on the task of 3D segmentation and demonstrate a set of novel understanding and editing tasks.

Problem

Research questions and friction points this paper is trying to address.

Disentangling 3D features into multiple fields

Enhancing 3D understanding with 2D supervision

Enabling isolated control for editing 3D components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple disentangled feature fields

View-dependent and independent components

2D supervision for 3D features

🔎 Similar Papers

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians