🤖 AI Summary
This work addresses the challenge of unified modeling across multi-source, multi-resolution Martian remote sensing data by proposing a general foundation model construction approach based on model merging. By introducing an Equal Validation Loss (EVL) strategy, the method precisely aligns the convergence phases of independently pre-trained models from three distinct sensors—HiRISE, CTX, and THEMIS—and leverages task arithmetic to achieve cross-sensor representation fusion. This study presents the first multi-sensor foundation model tailored for Martian remote sensing, which consistently outperforms ImageNet-pretrained models, Earth observation foundation models, single-sensor pre-trained variants, and fully supervised baselines across all nine downstream tasks in Mars-Bench, achieving particularly significant gains in segmentation performance.
📝 Abstract
We introduce MOMO, the first multi-sensor foundation model for Mars remote sensing. MOMO uses model merge to integrate representations learned independently from three key Martian sensors (HiRISE, CTX, and THEMIS), spanning resolutions from 0.25 m/pixel to 100 m/pixel. Central to our method is our novel Equal Validation Loss (EVL) strategy, which aligns checkpoints across sensors based on validation loss similarity before fusion via task arithmetic. This ensures models are merged at compatible convergence stages, leading to improved stability and generalization. We train MOMO on a large-scale, high-quality corpus of $\sim 12$ million samples curated from Mars orbital data and evaluate it on 9 downstream tasks from Mars-Bench. MOMO achieves better overall performance compared to ImageNet pre-trained, earth observation foundation model, sensor-specific pre-training, and fully-supervised baselines. Particularly on segmentation tasks, MOMO shows consistent and significant performance improvement. Our results demonstrate that model merging through an optimal checkpoint selection strategy provides an effective approach for building foundation models for multi-resolution data. The model weights, pretraining code, pretraining data, and evaluation code are available at: https://github.com/kerner-lab/MOMO.