🤖 AI Summary
This work addresses the semantic gap between pixel-based planetary surface imagery and the natural language used by scientists, which hinders large-scale, open-ended exploration of Martian landforms. To bridge this divide, the authors propose MarScope, a novel framework that leverages vision–language joint embedding models to align orbital images with textual descriptions in a shared semantic space, enabling arbitrary natural language queries without predefined labels. Trained on over 200,000 human-annotated image–text pairs, MarScope supports efficient, Mars-wide semantic retrieval and mapping, achieving an F1 score of 0.978 with query responses in under five seconds. The framework successfully facilitates both process-oriented and similarity-driven geomorphological analyses, transcending conventional classification paradigms.
📝 Abstract
Planetary surfaces are typically analyzed using high-level semantic concepts in natural language, yet vast orbital image archives remain organized at the pixel level. This mismatch limits scalable, open-ended exploration of planetary surfaces. Here we present MarScope, a planetary-scale vision-language framework enabling natural language-driven, label-free mapping of Martian landforms. MarScope aligns planetary images and text in a shared semantic space, trained on over 200,000 curated image-text pairs. This framework transforms global geomorphic mapping on Mars by replacing pre-defined classifications with flexible semantic retrieval, enabling arbitrary user queries across the entire planet in 5 seconds with F1 scores up to 0.978. Applications further show that it extends beyond morphological classification to facilitate process-oriented analysis and similarity-based geomorphological mapping at a planetary scale. MarScope establishes a new paradigm where natural language serves as a direct interface for scientific discovery over massive geospatial datasets.