The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study addresses sparse, multi-source, heterogeneous subsurface geoscience observations (e.g., stress orientation angles, mantle temperatures, tectonic plate types) by proposing the first scalable multimodal foundation model for unified cross-modal modeling and zero-shot subsurface property prediction. Methodologically, it introduces a novel text-embedding-driven modality encoding scheme integrated with positional encoding, enabling dynamic alignment and joint modeling of eight heterogeneous input types—including directional angles, categorical labels, and continuous physical quantities—within a Transformer architecture that employs multi-head attention for modality-agnostic feature fusion. Contributions include: (1) the first multimodal foundation model supporting arbitrary modality combinations and in-context learning; (2) plug-and-play extensibility to novel observation types; and (3) over threefold reduction in prediction error on stress orientation estimation, with consistent performance gains scaling with model size—demonstrating strong generalization and scalability.

Technology Category

Application Category

📝 Abstract

We present the Transparent Earth, a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets that vary in sparsity, resolution, and modality, where each modality represents a distinct type of observation (e.g., stress angle, mantle temperature, tectonic plate type). The model incorporates positional encodings of observations together with modality encodings, derived from a text embedding model applied to a description of each modality. This design enables the model to scale to an arbitrary number of modalities, making it straightforward to add new ones not considered in the initial design. We currently include eight modalities spanning directional angles, categorical classes, and continuous properties such as temperature and thickness. These capabilities support in-context learning, enabling the model to generate predictions either with no inputs or with an arbitrary number of additional observations from any subset of modalities. On validation data, this reduces errors in predicting stress angle by more than a factor of three. The proposed architecture is scalable and demonstrates improved performance with increased parameters. Together, these advances make the Transparent Earth an initial foundation model for the Earth's subsurface that ultimately aims to predict any subsurface property anywhere on Earth.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing subsurface properties from heterogeneous datasets

Handling varying sparsity, resolution, and modality types

Predicting any subsurface property anywhere on Earth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based architecture for subsurface reconstruction

Modality encodings from text embeddings for scalability

In-context learning with arbitrary observation inputs

🔎 Similar Papers

Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation