Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics

📅 2024-08-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of semantic-structural interpretability in multi-view 3D scene parsing. We propose an unsupervised part-level 3D representation learning framework that jointly optimizes geometric structure decomposition and appearance reconstruction. Our method innovatively integrates 2D Gaussian splatting with superquadric geometric priors to establish a part-aware hybrid representation paradigm. Multi-view geometric constraints—such as epipolar consistency and depth coherence—are incorporated to enable end-to-end unsupervised training without 3D supervision. Evaluated on DTU, ShapeNet, and real-world multi-view datasets, our approach significantly outperforms state-of-the-art methods. It achieves high-fidelity novel-view synthesis while, for the first time, enabling interpretable and editable semantic part-level 3D reconstruction. The resulting structured, component-wise 3D representations provide a principled foundation for downstream tasks including 3D editing, reasoning, and physics-based simulation.

Technology Category

Application Category

📝 Abstract
Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. However, human perception typically understands 3D objects at a higher level as a composition of parts or structures rather than points or voxels. Representing 3D objects or scenes as semantic parts can benefit further understanding and applications. In this paper, we introduce $ extbf{PartGS}$, $ extbf{part}$-aware 3D reconstruction by a hybrid representation of 2D $ extbf{G}$aussians and $ extbf{S}$uperquadrics, which parses objects or scenes into semantic parts, digging 3D structural clues from multi-view image inputs. Accurate structured geometry reconstruction and high-quality rendering are achieved at the same time. Our method simultaneously optimizes superquadric meshes and Gaussians by coupling their parameters within our hybrid representation. On one hand, this hybrid representation inherits the advantage of superquadrics to represent different shape primitives, supporting flexible part decomposition of scenes. On the other hand, 2D Gaussians capture complex texture and geometry details, ensuring high-quality appearance and geometry reconstruction. Our method is fully unsupervised and outperforms existing state-of-the-art approaches in extensive experiments on DTU, ShapeNet, and real-life datasets.
Problem

Research questions and friction points this paper is trying to address.

Develops hybrid 3D representation combining 2D Gaussians and superquadrics
Enables interpretable decomposition of objects into structural parts
Self-supervised learning for high-fidelity 3D reconstruction from multi-view images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised hybrid 3D representation learning
Combining 2D Gaussians and superquadrics
Joint optimization for part-aware decomposition
🔎 Similar Papers
No similar papers found.
Zhirui Gao
Zhirui Gao
National University of Defense Technology
Computer vision3D reconstructionDifferentiable rendering
Renjiao Yi
Renjiao Yi
National University of Defense Technology
Computer Graphics3D Vision
Yuhang Huang
Yuhang Huang
National University of Defense Technology
Deep LearningComputer Vision
W
Wei Chen
School of Computer, National University of Defense Technology, Changsha, 410073, China
C
Chenyang Zhu
School of Computer, National University of Defense Technology, Changsha, 410073, China
K
Kai Xu
School of Computer, National University of Defense Technology, Changsha, 410073, China