SIMPLER: Efficient Foundation Model Adaptation via Similarity-Guided Layer Pruning for Earth Observation

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of fine-tuning foundation models for Earth observation by proposing an unsupervised, gradient-free layer pruning method that requires no hyperparameter tuning or heuristic rules on parameter counts. Prior to fine-tuning, the approach leverages unlabeled data to compute representational similarity across layers of a Vision Transformer and automatically identifies and removes redundant layers to efficiently determine the optimal model depth. Evaluated on Prithvi-EO-2, the method retains 94% of the original performance while preserving only 21% of the parameters, achieving a 2.1× speedup in training and a 2.6× speedup in inference. The generalizability of the approach is further demonstrated on TerraMind and ViT-MAE, confirming its effectiveness across diverse vision foundation models.

Technology Category

Application Category

📝 Abstract
Fine-tuning foundation models for Earth Observation is computationally expensive, with high training time and memory demands for both training and deployment. Parameter-efficient methods reduce training cost but retain full inference complexity, while post-hoc compression optimizes inference only after costly full fine-tuning. We introduce SIMPLER, a pre-fine-tuning architecture selection method that reduces inference and deployment costs by identifying an effective model depth before adaptation. SIMPLER exploits stabilization of representations in deeper layers of pre-trained vision transformers: it computes layer-wise representation similarity on unlabeled task data and applies an automated scoring function to select redundant layers, with no gradients, magnitude heuristics, or hyperparameter tuning required. On Prithvi-EO-2, SIMPLER prunes up to 79% of parameters while retaining 94% of baseline performance, yielding a 2.1x training speedup and 2.6x inference speedup. The method generalizes to TerraMind (a multimodal EO foundation model) and ImageNet-pretrained ViT-MAE, demonstrating applicability across tasks, architectures, and spectral modalities. Code is available at https://gitlab.citius.gal/hpc4rs/simpler.
Problem

Research questions and friction points this paper is trying to address.

foundation model adaptation
Earth Observation
model pruning
computational efficiency
inference cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

layer pruning
representation similarity
foundation model adaptation
parameter efficiency
vision transformer
🔎 Similar Papers
No similar papers found.
V
Víctor Barreiro
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Spain; Departmento de Electrónica e Computación, Universidade de Santiago de Compostela, Spain
Johannes Jakubik
Johannes Jakubik
Research Scientist @ IBM Research Europe
AI for Climate ImpactDeep learningMulti-modality
F
Francisco Argüello
Departmento de Electrónica e Computación, Universidade de Santiago de Compostela, Spain
D
Dora B. Heras
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Spain; Departmento de Electrónica e Computación, Universidade de Santiago de Compostela, Spain