Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

The Mars science community has long lacked standardized evaluation benchmarks, hindering the development of domain-specific foundation models. To address this, we introduce Mars-Bench—the first comprehensive benchmark tailored to Mars science tasks—covering both orbital and surface imagery and supporting classification, segmentation, and detection of key geological features including craters, cones, rocks, and frost. Mars-Bench unifies 20 publicly available datasets under a consistent evaluation protocol and provides domain-adapted pretraining baselines leveraging natural images, Earth remote sensing data, and vision-language models. Empirical results demonstrate that Mars-specific pretraining yields substantial performance gains over general-purpose models across all tasks. By establishing a rigorous, reproducible evaluation infrastructure, Mars-Bench fills a critical gap in the field and enables systematic assessment, comparison, and advancement of machine learning models for Mars science.

Technology Category

Application Category

📝 Abstract

Foundation models have enabled rapid progress across many specialized domains by leveraging large-scale pre-training on unlabeled data, demonstrating strong generalization to a variety of downstream tasks. While such models have gained significant attention in fields like Earth Observation, their application to Mars science remains limited. A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation. In contrast, Mars science lacks such benchmarks and standardized evaluation frameworks, which have limited progress toward developing foundation models for Martian tasks. To address this gap, we introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks using both orbital and surface imagery. Mars-Bench comprises 20 datasets spanning classification, segmentation, and object detection, focused on key geologic features such as craters, cones, boulders, and frost. We provide standardized, ready-to-use datasets and baseline evaluations using models pre-trained on natural images, Earth satellite data, and state-of-the-art vision-language models. Results from all analyses suggest that Mars-specific foundation models may offer advantages over general-domain counterparts, motivating further exploration of domain-adapted pre-training. Mars-Bench aims to establish a standardized foundation for developing and comparing machine learning models for Mars science. Our data, models, and code are available at: https://mars-bench.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Mars science lacks standardized benchmarks for foundation model evaluation

Mars-Bench addresses the gap in evaluating models for Martian tasks

It provides standardized datasets for orbital and surface imagery analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Mars-Bench benchmark for Mars science evaluation

Provides standardized datasets for classification and segmentation tasks

Evaluates domain-adapted foundation models using orbital imagery

🔎 Similar Papers

Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications