🤖 AI Summary
This work addresses the challenge of sparse-view satellite image reconstruction, where large illumination variations and weak or repetitive surface textures lead to sparse and unreliable multi-view geometric constraints. To overcome this, the authors propose a feed-forward reconstruction framework based on 2D Gaussian splatting, which explicitly models local geometric reliability through coarse-to-fine Gaussian attribute prediction during feature learning, parameter estimation, and optimization. Key innovations include a confidence-aware monocular multi-view feature fusion module, an inter-stage self-consistency residual guidance mechanism, and a bidirectional confidence-routing loss that enables differential allocation of geometric and appearance supervision. Experiments demonstrate that the method significantly outperforms existing general-purpose and scene-optimized approaches across multiple satellite datasets, achieving state-of-the-art performance in rendering quality, reconstruction accuracy, generalization capability, and inference efficiency.
📝 Abstract
Sparse-view satellite image surface reconstruction remains highly challenging, fundamentally because the reliability of multi-view matching under satellite imaging conditions is strongly spatially heterogeneous. Affected by large photometric differences, weak textures, and repetitive textures, multi-view geometric constraints are often sparse, unevenly distributed, and locally unreliable. Although 2D Gaussian Splatting (2DGS) is more suitable than 3D Gaussian Splatting (3DGS) for the explicit representation of continuous surfaces, research on generalizable feed-forward 2DGS frameworks for sparse-view satellite surface reconstruction is still lacking. To address this issue, we propose SatSurfGS, a generalizable sparse-view surface reconstruction method for satellite imagery based on 2DGS. The proposed method builds a coarse-to-fine Gaussian attribute prediction framework and explicitly models local geometric reliability at three levels: feature learning, Gaussian parameter estimation, and training optimization. Specifically, we propose a confidence-aware monocular multi-view feature fusion module to adaptively integrate monocular priors and multi-view matching features according to local confidence; a cross-stage self-consistency residual guidance module to stabilize stage-wise Gaussian parameter refinement using the residual between the rendered height map from the previous stage and the current-stage MVS height map, together with confidence information; and a confidence bidirectional routing loss to achieve differentiated allocation of geometric and appearance supervision. Experiments on satellite datasets show that the proposed method achieves improved rendering quality, surface reconstruction accuracy, cross-dataset generalization, and inference efficiency compared with representative generalizable baselines and competitive per-scene optimization methods.