๐ค AI Summary
To address the weak transferability of pretrained models caused by structural heterogeneity among Earth observation (EO) multimodal dataโsuch as spectral, elevation, and segmentation mapsโthis work pioneers the adaptation of the MultiMAE framework to the EO domain. We propose a multimodal, multitask masked autoencoding pretraining method capable of processing arbitrary subsets of modalities. By enforcing cross-modal feature alignment and joint reconstruction, our approach abandons modality-specific pretraining paradigms and enables a unified model to flexibly accommodate heterogeneous inputs. Evaluated on multiple EO benchmarks, our method surpasses state-of-the-art approaches on both classification and segmentation tasks. Under end-to-end fine-tuning, it delivers consistent transfer performance gains of 12.6%โ18.3%, significantly enhancing generalization capability and deployment flexibility.
๐ Abstract
Multi-modal data in Earth Observation (EO) presents a huge opportunity for improving transfer learning capabilities when pre-training deep learning models. Unlike prior work that often overlooks multi-modal EO data, recent methods have started to include it, resulting in more effective pre-training strategies. However, existing approaches commonly face challenges in effectively transferring learning to downstream tasks where the structure of available data differs from that used during pre-training. This paper addresses this limitation by exploring a more flexible multi-modal, multi-task pre-training strategy for EO data. Specifically, we adopt a Multi-modal Multi-task Masked Autoencoder (MultiMAE) that we pre-train by reconstructing diverse input modalities, including spectral, elevation, and segmentation data. The pre-trained model demonstrates robust transfer learning capabilities, outperforming state-of-the-art methods on various EO datasets for classification and segmentation tasks. Our approach exhibits significant flexibility, handling diverse input configurations without requiring modality-specific pre-trained models. Code will be available at: https://github.com/josesosajs/multimae-meets-eo.