🤖 AI Summary
This work addresses the challenge of large-scale 3D surface reconstruction from multi-temporal satellite imagery, which is hindered by illumination variations, sensor heterogeneity, and the high computational cost of per-scene optimization. The authors propose SwiftGS, a novel system that leverages meta-learning to construct a hybrid representation decoupling geometry and radiance. SwiftGS predicts Gaussian primitives and a lightweight signed distance field (SDF) in a single forward pass, incorporating physics-aware rendering and a scene-conditioned meta-training paradigm to capture transferable cross-scene priors. Key innovations include a differentiable physics-informed graph model, spatially gated fusion, joint semantic-geometric optimization, and a conditional lightweight subtask head. Without requiring per-scene fine-tuning, SwiftGS achieves high-fidelity digital surface model reconstruction and view-consistent rendering while substantially reducing computational overhead.
📝 Abstract
Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization. We introduce SwiftGS, a meta-learned system that reconstructs 3D surfaces in a single forward pass by predicting geometry-radiation-decoupled Gaussian primitives together with a lightweight SDF, replacing expensive per-scene fitting with episodic training that captures transferable priors. The model couples a differentiable physics graph for projection, illumination, and sensor response with spatial gating that blends sparse Gaussian detail and global SDF structure, and incorporates semantic-geometric fusion, conditional lightweight task heads, and multi-view supervision from a frozen geometric teacher under an uncertainty-aware multi-task loss. At inference, SwiftGS operates zero-shot with optional compact calibration and achieves accurate DSM reconstruction and view-consistent rendering at significantly reduced computational cost, with ablations highlighting the benefits of the hybrid representation, physics-aware rendering, and episodic meta-training.