🤖 AI Summary
This study addresses the absence of a multimodal benchmark dataset in seismology that integrates seismic waveforms, geospatial imagery, and metadata—a gap that has hindered the application of general-purpose multimodal models. To bridge this, the authors introduce MultiSeismo, a standardized multimodal dataset comprising 16,000 earthquake events, which unifies for the first time seismic waveforms, intensity maps, population exposure visualizations, and textual descriptions. Accompanying this dataset is MISCE, a curated set of multimodal instructions. Building upon the Unified IO 2 architecture and incorporating a dedicated time-series encoder with structured JSON-based instruction tuning, the authors develop SeisModal—the first domain-specific multimodal model for seismic understanding. Experiments demonstrate that SeisModal significantly outperforms existing general-purpose models on multimodal seismic reasoning tasks, confirming MultiSeismo’s effectiveness as a benchmark for multimodal earthquake research.
📝 Abstract
The application of generalist multimodal models (GMMs) to specialized scientific domains remains limited due to the scarcity of comprehensive domain-specific datasets that integrate multiple data modalities beyond text and images. In seismology, understanding earthquake phenomena requires the synthesis of timeseries waveform data, geographical imagery, and contextual metadata, a multimodal integration absent in existing seismic datasets. We present MultiSeismo, a large scale structured multimodal seismic dataset, comprising over 16K seismic events spanning 13 years (2010 to 2023) across diverse geographical regions. Each event data integrates waveform recordings from global station networks, intensity maps, population exposure visualizations, and a comprehensive textual description within a standardized JSON format. We additionally develop MISCE, a multimodal instruction set on top of raw data to enable supervised training and evaluation of GMMs on seismic reasoning tasks ranging from basic information retrieval to complex cross modal analysis. We leverage MISCE to finetune an existing multimodal model (Unified IO 2) enhanced with a specialized timeseries encoder, which yields SeisModal, the first domain specific multimodal model for comprehensive seismic analysis. Evaluation of state of the art multimodal models on MultiSeismo reveals significant challenges, particularly with time-series data processing for general purpose models, while demonstrating SeisModal's superior performance on seismic multimodal reasoning tasks. These results prove that MultiSeismo provides a rigorous benchmark for future multimodal research in seismology and validate the success of our domain specific architectural adaptations.