MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the limitations of existing vision-language benchmarks, which are largely confined to closed-set supervised tasks and thus ill-suited for text-guided geospatial retrieval on the Martian surface. To bridge this gap, we introduce MarsRetrieval—the first cross-modal retrieval benchmark designed for planetary-scale geospatial discovery—featuring three core tasks: image-text paired retrieval, landform retrieval, and global geolocation. Spanning diverse Martian terrains and multi-scale spatial extents, the benchmark establishes a unified evaluation protocol. We systematically assess multimodal embedding performance using contrastive dual-encoder architectures and generative vision-language models, both enhanced with domain-specific fine-tuning strategies. Experimental results reveal that general-purpose foundation models exhibit significant limitations in discriminating Martian landforms, underscoring the critical role of domain adaptation in achieving generalizable extraterrestrial geospatial retrieval.

Technology Category

Application Category

📝 Abstract

Data-driven approaches like deep learning are rapidly advancing planetary science, particularly in Mars exploration. Despite recent progress, most existing benchmarks remain confined to closed-set supervised visual tasks and do not support text-guided retrieval for geospatial discovery. We introduce MarsRetrieval, a retrieval benchmark for evaluating vision-language models for Martian geospatial discovery. MarsRetrieval includes three tasks: (1) paired image-text retrieval, (2) landform retrieval, and (3) global geo-localization, covering multiple spatial scales and diverse geomorphic origins. We propose a unified retrieval-centric protocol to benchmark multimodal embedding architectures, including contrastive dual-tower encoders and generative vision-language models. Our evaluation shows MarsRetrieval is challenging: even strong foundation models often fail to capture domain-specific geomorphic distinctions. We further show that domain-specific fine-tuning is critical for generalizable geospatial discovery in planetary settings. Our code is available at https://github.com/ml-stat-Sustech/MarsRetrieval

Problem

Research questions and friction points this paper is trying to address.

geospatial retrieval

vision-language models

Mars exploration

planetary science

text-guided retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models

geospatial retrieval

Mars exploration