🤖 AI Summary
This study addresses the limitations of insufficient accuracy and structural detail in digital surface models (DSMs) generated from satellite stereo imagery by systematically integrating advanced learning-based stereo matching models—StereoAnywhere, MonSter, and Foundation Stereo—into the satellite photogrammetry pipeline for the first time. By designing image rectification strategies tailored to satellite imaging geometry and incorporating disparity polarity and range constraints, the method enables end-to-end DSM generation. Experiments demonstrate that the proposed approach significantly outperforms conventional algorithms across multiple satellite image datasets, yielding DSMs with richer geometric detail and clearer structural fidelity. The authors release open-source code to support large-scale Earth observation applications. Although challenges remain in complex terrains such as dense vegetation and the mean absolute error (MAE) metric exhibits saturation effects, this work establishes an effective framework for deep learning–driven high-precision 3D reconstruction from satellite imagery.
📝 Abstract
Digital Surface Model generation from satellite imagery is a core task in Earth observation and is commonly addressed using classical stereoscopic matching algorithms in satellite pipelines as in the Satellite Stereo Pipeline (S2P). While recent learning-based stereo matchers achieve state-of-the-art performance on standard benchmarks, their integration into operational satellite pipelines remains challenging due to differences in viewing geometry and disparity assumptions. In this work, we integrate several modern learning-based stereo matchers, including StereoAnywhere, MonSter, Foundation Stereo, and a satellite fine-tuned variant of MonSter, into the Satellite Stereo Pipeline, adapting the rectification stage to enforce compatible disparity polarity and range. We release the corresponding code to enable reproducible use of these methods in large-scale Earth observation workflows. Experiments on satellite imagery show consistent improvements over classical cost-volume-based approaches in terms of Digital Surface Model accuracy, although commonly used metrics such as mean absolute error exhibit saturation effects. Qualitative results reveal substantially improved geometric detail and sharper structures, highlighting the need for evaluation strategies that better reflect perceptual and structural fidelity. At the same time, performance over challenging surface types such as vegetation remains limited across all evaluated models, indicating open challenges for learning-based stereo in natural environments.