InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

📅 2024-11-28
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the imbalanced appearance and semantic representation, inconsistent object boundary modeling, and category bias induced by top-down instance segmentation in 3D Gaussian splatting, this paper proposes the first end-to-end, category-agnostic, point-level 3D instance segmentation framework. Our method introduces three key innovations: (1) a semantic skeleton Gaussian representation that decouples yet jointly models geometric structure, appearance details, and semantic consistency; (2) a progressive appearance–semantic joint training strategy to mitigate optimization conflicts; and (3) a bottom-up instance aggregation approach leveraging farthest-point sampling and connected-component analysis, eliminating reliance on predefined categories. Evaluated on an open-vocabulary 3D instance segmentation benchmark, our framework achieves state-of-the-art performance, with significant improvements in boundary accuracy and cross-category generalization.

Technology Category

Application Category

📝 Abstract
3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations. However, three major challenges remain in leveraging 3DGS for scene understanding: 1) an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes; 2) inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and 3) reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation. In this work, we propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include: i) a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation; ii) a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and iii) a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies. Project page: https://lhj-git.github.io/InstanceGaussian/
Problem

Research questions and friction points this paper is trying to address.

Balancing appearance and semantics in 3D Gaussian representations
Resolving inconsistencies between appearance and object boundaries
Improving instance segmentation with category-agnostic aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint appearance-semantic Gaussian representation for 3D perception
Progressive joint training strategy for segmentation accuracy
Bottom-up instance aggregation via farthest point sampling
🔎 Similar Papers
No similar papers found.
H
Haijie Li
School of Electronic and Computer Engineering, Peking University, China
Y
Yanmin Wu
School of Electronic and Computer Engineering, Peking University, China
Jiarui Meng
Jiarui Meng
Peking University
3D Reconstruction3D Vision
Q
Qiankun Gao
School of Electronic and Computer Engineering, Peking University, China
Zhiyao Zhang
Zhiyao Zhang
The Ohio State University
OptimizationReinforcement LearningBandits
Ronggang Wang
Ronggang Wang
Shenzhen Graduate School, Peking University
Immersive Video Coding and Processing
J
Jian Zhang
School of Electronic and Computer Engineering, Peking University, China; Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Peking University Shenzhen Graduate School, China