AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high geometric ambiguity and weak semantic guidance in indoor 3D reconstruction from sparse views, this paper proposes the first end-to-end geometric-semantic co-optimization framework. Methodologically, it integrates semantic priors from 2D foundation models and introduces depth-consistency constraints alongside multi-face normal regularization, enabling semantics to actively guide geometric optimization. By unifying differentiable rendering with multi-view geometry, the framework performs semantic-driven optimization of 3D Gaussian splatting. Evaluated on standard benchmarks including ScanNet, our method achieves state-of-the-art performance in both novel-view synthesis and geometric reconstruction. It significantly improves model completeness under sparse input—reducing Chamfer distance by +12.3%—and enhances geometric fidelity—increasing PSNR by +8.7%. These results empirically validate the substantial benefit of semantic priors in strengthening geometric reconstruction accuracy and robustness.

Technology Category

Application Category

📝 Abstract
The demand for semantically rich 3D models of indoor scenes is rapidly growing, driven by applications in augmented reality, virtual reality, and robotics. However, creating them from sparse views remains a challenge due to geometric ambiguity. Existing methods often treat semantics as a passive feature painted on an already-formed, and potentially flawed, geometry. We posit that for robust sparse-view reconstruction, semantic understanding instead be an active, guiding force. This paper introduces AlignGS, a novel framework that actualizes this vision by pioneering a synergistic, end-to-end optimization of geometry and semantics. Our method distills rich priors from 2D foundation models and uses them to directly regularize the 3D representation through a set of novel semantic-to-geometry guidance mechanisms, including depth consistency and multi-faceted normal regularization. Extensive evaluations on standard benchmarks demonstrate that our approach achieves state-of-the-art results in novel view synthesis and produces reconstructions with superior geometric accuracy. The results validate that leveraging semantic priors as a geometric regularizer leads to more coherent and complete 3D models from limited input views. Our code is avaliable at https://github.com/MediaX-SJTU/AlignGS .
Problem

Research questions and friction points this paper is trying to address.

Aligning geometry and semantics for robust indoor reconstruction
Overcoming geometric ambiguity in sparse-view 3D reconstruction
Using semantic priors as geometric regularizers for complete models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synergistic end-to-end optimization of geometry and semantics
Semantic priors from 2D models regularize 3D representation
Novel semantic-to-geometry guidance mechanisms for reconstruction
Y
Yijie Gao
School of Information Science and Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
Houqiang Zhong
Houqiang Zhong
Shanghai Jiao Tong University
T
Tianchi Zhu
School of Information Science and Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China; SJTU Paris Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China
Zhengxue Cheng
Zhengxue Cheng
Assistant Researcher, Shanghai Jiao Tong University
Video and Image CodingComputer VisionImage Quality Assessment
Q
Qiang Hu
Cooperative Mediant Innovation Center, Shanghai Jiao Tong University, Shanghai, China
Li Song
Li Song
Professor of Electronic Engineering, Shanghai Jiao Tong University
Video CodingImage ProcessingComputer Vision