🤖 AI Summary
To address high fine-tuning costs, catastrophic forgetting, and poor cross-regional generalization in large remote sensing foundation models, this study systematically evaluates parameter-efficient fine-tuning (PEFT) methods—including LoRA, Adapter, and Prompt Tuning—across five remote sensing data categories. We empirically demonstrate, for the first time, that PEFT significantly enhances cross-regional generalization. We identify an optimal configuration coupling a UNet decoder with metadata-free fine-tuning. We establish PEFT as a new paradigm for lightweight geospatial model adaptation. Experiments on multiple architectures—including Prithvi and SatMAE—within the TerraTorch framework show that PEFT achieves accuracy comparable to or exceeding full-parameter fine-tuning, while reducing GPU memory consumption by 72% and training time by 65%. All code and trained models are publicly released and integrated.
📝 Abstract
Earth observation (EO) is crucial for monitoring environmental changes, responding to disasters, and managing natural resources. In this context, foundation models facilitate remote sensing image analysis to retrieve relevant geoinformation accurately and efficiently. However, as these models grow in size, fine-tuning becomes increasingly challenging due to the associated computational resources and costs, limiting their accessibility and scalability. Furthermore, full fine-tuning can lead to forgetting pre-trained features and even degrade model generalization. To address this, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a promising solution. In this paper, we conduct extensive experiments with various foundation model architectures and PEFT techniques to evaluate their effectiveness on five different EO datasets. Our results provide a comprehensive comparison, offering insights into when and how PEFT methods support the adaptation of pre-trained geospatial models. We demonstrate that PEFT techniques match or even exceed full fine-tuning performance and enhance model generalisation to unseen geographic regions, while reducing training time and memory requirements. Additional experiments investigate the effect of architecture choices such as the decoder type or the use of metadata, suggesting UNet decoders and fine-tuning without metadata as the recommended configuration. We have integrated all evaluated foundation models and techniques into the open-source package TerraTorch to support quick, scalable, and cost-effective model adaptation.