Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction

📅 2025-07-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generic 3D Gaussian Splatting (3DGS) reconstruction suffers from high computational overhead, reliance on large-scale annotated data and pose priors, and joint regression of geometry and appearance—leading to slow convergence and poor generalization. To address these limitations, we propose a decoupled generic 3DGS reconstruction framework: a stereo-vision backbone extracts local features from image pairs, while global attention fuses multi-view information to separately predict point cloud structure and Gaussian attributes (position, scale, rotation, opacity, and spherical harmonic coefficients), enabling end-to-end, pose-free reconstruction. This design reduces GPU memory consumption by ~40% and training time by 35%, without compromising reconstruction quality. Our method achieves state-of-the-art performance on ScanNet and Mip-NeRF360 benchmarks, demonstrating superior efficiency, robustness, and scalability to real-world scenes.

Technology Category

Application Category

📝 Abstract
Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation but requires substantial computational resources and large datasets, posing challenges to training models from scratch. Current methods usually entangle the prediction of 3D Gaussian geometry and appearance, which rely heavily on data-driven priors and result in slow regression speeds. To address this, we propose method, a disentangled framework for efficient 3D Gaussian prediction. Our method extracts features from local image pairs using a stereo vision backbone and fuses them via global attention blocks. Dedicated point and Gaussian prediction heads generate multi-view point-maps for geometry and Gaussian features for appearance, combined as GS-maps to represent the 3DGS object. A refinement network enhances these GS-maps for high-quality reconstruction. Unlike existing methods that depend on camera parameters, our approach achieves pose-free 3D reconstruction, improving robustness and practicality. By reducing resource demands while maintaining high-quality outputs, method provides an efficient, scalable solution for real-world 3D content generation.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational resources for 3D Gaussian Splatting reconstruction
Disentangling geometry and appearance prediction for faster regression
Achieving pose-free 3D reconstruction for improved robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled framework for efficient 3D Gaussian prediction
Stereo vision backbone with global attention fusion
Pose-free 3D reconstruction enhancing robustness
🔎 Similar Papers
No similar papers found.