🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS) methods lack explicit modeling of underlying 3D semantics, resulting in limited controllability and interpretability. To address this, we propose the first unsupervised, hierarchically disentangled single-view 3DGS framework. Our method employs a dual-branch architecture—comprising point cloud initialization and triplane-guided Gaussian generation—coupled with Disentangled Representation Learning (DRL) to jointly model geometric structure and appearance semantics. It achieves hierarchical semantic separation: coarse-grained (object parts) and fine-grained (material/texture) levels. An encoder adapter enables lightweight fine-tuning without additional annotations. Experiments demonstrate that our approach maintains state-of-the-art rendering quality and real-time performance while enabling independent geometric/appearance editing and semantics-driven manipulation. This significantly enhances model interpretability and generative controllability.
📝 Abstract
Gaussian Splatting (GS) has recently marked a significant advancement in 3D reconstruction, delivering both rapid rendering and high-quality results. However, existing 3DGS methods pose challenges in understanding underlying 3D semantics, which hinders model controllability and interpretability. To address it, we propose an interpretable single-view 3DGS framework, termed 3DisGS, to discover both coarse- and fine-grained 3D semantics via hierarchical disentangled representation learning (DRL). Specifically, the model employs a dual-branch architecture, consisting of a point cloud initialization branch and a triplane-Gaussian generation branch, to achieve coarse-grained disentanglement by separating 3D geometry and visual appearance features. Subsequently, fine-grained semantic representations within each modality are further discovered through DRL-based encoder-adapters. To our knowledge, this is the first work to achieve unsupervised interpretable 3DGS. Evaluations indicate that our model achieves 3D disentanglement while preserving high-quality and rapid reconstruction.