Scene-Agnostic Object-Centric Representation Learning for 3D Gaussian Splatting

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work addresses the limited generalizability of existing methods for 3D Gaussian splatting, which often rely on scene-specific supervision and intricate mask processing. To overcome this, the study introduces unsupervised object-centric learning into 3D Gaussian splatting for the first time, proposing a scene-agnostic object codebook. This codebook, combined with unsupervised object masks generated by a pretrained slot attention module, directly supervises the instance identity features of 3D Gaussians. The resulting Global Object-Centric Learning (GOCL) framework eliminates the need for per-scene fine-tuning or post-processing, substantially enhancing the structuredness of learned representations and cross-scene generalization. Consequently, GOCL effectively supports downstream applications such as robotic interaction and scene understanding.

Technology Category

Application Category

📝 Abstract
Recent works on 3D scene understanding leverage 2D masks from visual foundation models (VFMs) to supervise radiance fields, enabling instance-level 3D segmentation. However, the supervision signals from foundation models are not fundamentally object-centric and often require additional mask pre/post-processing or specialized training and loss design to resolve mask identity conflicts across views. The learned identity of the 3D scene is scene-dependent, limiting generalizability across scenes. Therefore, we propose a dataset-level, object-centric supervision scheme to learn object representations in 3D Gaussian Splatting (3DGS). Building on a pre-trained slot attention-based Global Object Centric Learning (GOCL) module, we learn a scene-agnostic object codebook that provides consistent, identity-anchored representations across views and scenes. By coupling the codebook with the module's unsupervised object masks, we can directly supervise the identity features of 3D Gaussians without additional mask pre-/post-processing or explicit multi-view alignment. The learned scene-agnostic codebook enables object supervision and identification without per-scene fine-tuning or retraining. Our method thus introduces unsupervised object-centric learning (OCL) into 3DGS, yielding more structured representations and better generalization for downstream tasks such as robotic interaction, scene understanding, and cross-scene generalization.
Problem

Research questions and friction points this paper is trying to address.

object-centric representation
3D Gaussian Splatting
scene-agnostic learning
mask identity conflict
cross-scene generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

object-centric learning
3D Gaussian Splatting
scene-agnostic representation
slot attention
codebook
🔎 Similar Papers
No similar papers found.