SIMPLER: Efficient Foundation Model Adaptation via Similarity-Guided Layer Pruning for Earth Observation

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the high computational cost of fine-tuning foundation models for Earth observation by proposing an unsupervised, gradient-free layer pruning method that requires no hyperparameter tuning or heuristic rules on parameter counts. Prior to fine-tuning, the approach leverages unlabeled data to compute representational similarity across layers of a Vision Transformer and automatically identifies and removes redundant layers to efficiently determine the optimal model depth. Evaluated on Prithvi-EO-2, the method retains 94% of the original performance while preserving only 21% of the parameters, achieving a 2.1× speedup in training and a 2.6× speedup in inference. The generalizability of the approach is further demonstrated on TerraMind and ViT-MAE, confirming its effectiveness across diverse vision foundation models.

Technology Category

Application Category

📝 Abstract

Fine-tuning foundation models for Earth Observation is computationally expensive, with high training time and memory demands for both training and deployment. Parameter-efficient methods reduce training cost but retain full inference complexity, while post-hoc compression optimizes inference only after costly full fine-tuning. We introduce SIMPLER, a pre-fine-tuning architecture selection method that reduces inference and deployment costs by identifying an effective model depth before adaptation. SIMPLER exploits stabilization of representations in deeper layers of pre-trained vision transformers: it computes layer-wise representation similarity on unlabeled task data and applies an automated scoring function to select redundant layers, with no gradients, magnitude heuristics, or hyperparameter tuning required. On Prithvi-EO-2, SIMPLER prunes up to 79% of parameters while retaining 94% of baseline performance, yielding a 2.1x training speedup and 2.6x inference speedup. The method generalizes to TerraMind (a multimodal EO foundation model) and ImageNet-pretrained ViT-MAE, demonstrating applicability across tasks, architectures, and spectral modalities. Code is available at https://gitlab.citius.gal/hpc4rs/simpler.

Problem

Research questions and friction points this paper is trying to address.

foundation model adaptation

Earth Observation

model pruning

computational efficiency

inference cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

layer pruning

representation similarity

foundation model adaptation