REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Remote sensing foundation model (RSFM) selection faces challenges including fragmented documentation, heterogeneous formats, and diverse deployment constraints. To address this, we propose REMSA—the first large language model (LLM) agent specifically designed for RSFM selection—integrating dense retrieval, unstructured retrieval-augmented generation (RAG), and in-context learning to enable privacy-preserving, natural-language-driven automated recommendation. Our approach introduces two key innovations: transparent reasoning traces and constraint completion mechanisms. We further construct RS-FMD, a structured database covering 150+ RSFMs, and an expert-validated benchmark suite. Evaluated across 900 realistic configuration scenarios, REMSA significantly outperforms multiple baselines in both accuracy and efficiency. This work advances standardization and interpretability in RSFM selection, enabling scalable, trustworthy, and explainable model recommendation for remote sensing applications.

Technology Category

Application Category

📝 Abstract
Foundation Models (FMs) are increasingly used in remote sensing (RS) for tasks such as environmental monitoring, disaster assessment, and land-use mapping. These models include unimodal vision encoders trained on a single data modality and multimodal architectures trained on combinations of SAR, multispectral, hyperspectral, and image-text data. They support diverse RS tasks including semantic segmentation, image classification, change detection, and visual question answering. However, selecting an appropriate remote sensing foundation model (RSFM) remains difficult due to scattered documentation, heterogeneous formats, and varied deployment constraints. We introduce the RSFM Database (RS-FMD), a structured resource covering over 150 RSFMs spanning multiple data modalities, resolutions, and learning paradigms. Built on RS-FMD, we present REMSA, the first LLM-based agent for automated RSFM selection from natural language queries. REMSA interprets user requirements, resolves missing constraints, ranks candidate models using in-context learning, and provides transparent justifications. We also propose a benchmark of 75 expert-verified RS query scenarios, producing 900 configurations under an expert-centered evaluation protocol. REMSA outperforms several baselines, including naive agents, dense retrieval, and unstructured RAG-based LLMs. It operates entirely on publicly available metadata and does not access private or sensitive data.
Problem

Research questions and friction points this paper is trying to address.

Selecting appropriate remote sensing foundation models remains difficult due to scattered documentation
Existing models have heterogeneous formats and varied deployment constraints for users
Automated model selection from natural language queries requires intelligent interpretation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured database covering 150+ remote sensing foundation models
LLM-based agent for automated model selection from queries
Benchmark with 75 expert-verified query scenarios for evaluation
B
Binger Chen
Technische Universität Berlin & BIFOLD, Berlin, Germany
T
Tacettin Emre Bök
Technische Universität Berlin & BIFOLD, Berlin, Germany
B
Behnood Rasti
Technische Universität Berlin & BIFOLD, Berlin, Germany
Volker Markl
Volker Markl
Technische Universität Berlin
Database SystemsData ManagementBig DataProgramming ModelsQuery Processing
Begüm Demir
Begüm Demir
Professor, BIFOLD and Faculty of EECS, Technische Universität Berlin
Remote SensingMachine LearningImage AnalysisSignal Processing