🤖 AI Summary
To address domain discrepancies and hallucination artifacts introduced by preprocessing (e.g., pansharpening) in satellite multispectral (MS) and panchromatic (PAN) image-based 3D reconstruction, this paper proposes an end-to-end neural radiance field (NeRF) framework. Our method directly optimizes raw MS/PAN observations—bypassing conventional image-level fusion—while jointly modeling geometry and appearance. Key contributions include: (1) cross-resolution convolutional kernels that enable seamless feature fusion between MS and PAN modalities during training; and (2) a multimodal appearance embedding that jointly encodes spectral band differences and viewpoint variations. Evaluated on WorldView-3 data, our approach achieves a 17% average improvement in depth reconstruction accuracy, significantly enhancing novel-view synthesis quality and geometric fidelity. This work establishes a differentiable, preprocessing-free paradigm for high-resolution remote sensing 3D reconstruction.
📝 Abstract
We introduce FusionRF, a novel framework for digital surface reconstruction from satellite multispectral and panchromatic images. Current work has demonstrated the increased accuracy of neural photogrammetry for surface reconstruction from optical satellite images compared to algorithmic methods. Common satellites produce both a panchromatic and multispectral image, which contain high spatial and spectral information respectively. Current neural reconstruction methods require multispectral images to be upsampled with a pansharpening method using the spatial data in the panchromatic image. However, these methods may introduce biases and hallucinations due to domain gaps. FusionRF introduces joint image fusion during optimization through a novel cross-resolution kernel that learns to resolve spatial resolution loss present in multispectral images. As input, FusionRF accepts the original multispectral and panchromatic data, eliminating the need for image preprocessing. FusionRF also leverages multimodal appearance embeddings that encode the image characteristics of each modality and view within a uniform representation. By optimizing on both modalities, FusionRF learns to fuse image modalities while performing reconstruction tasks and eliminates the need for a pansharpening preprocessing step. We evaluate our method on multispectral and panchromatic satellite images from the WorldView-3 satellite in various locations, and show that FusionRF provides an average of 17% improvement in depth reconstruction accuracy, and renders sharp training and novel views.