Parallel Seismic Data Processing Performance with Cloud-based Storage

📅 2025-04-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address I/O bottlenecks and scalability limitations arising from rapidly growing seismic waveform data volumes in cloud environments, this paper proposes a cloud-native parallel processing framework based on “preemptive data reduction.” The framework adopts a hybrid paradigm—locally driven yet cloud-coordinated—and integrates the MsPASS system into AWS Lambda’s serverless platform to enable lightweight, cloud-based preprocessing (e.g., filtering, trimming, downsampling), thereby drastically reducing data transfer overhead. Data flow is further optimized via coordinated scheduling between cloud object storage and on-premises HPC resources. Experimental evaluation on terabyte-scale seismic datasets demonstrates throughput comparable to that of local HPC file systems. This work constitutes the first empirical validation of cloud-native architectures for real-time, scalable processing of large-scale seismic waveforms, overcoming the performance and capacity constraints inherent in traditional centralized, on-premises processing paradigms.

Technology Category

Application Category

📝 Abstract
This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance compute cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model where processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service's Lamba service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data.
Problem

Research questions and friction points this paper is trying to address.

Optimizing seismic data processing with cloud storage
Reducing data transfer bottlenecks in hybrid systems
Improving performance for large-scale seismic analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid cloud-local seismic data processing
Utilizes AWS Lambda for parallel processing
Reduces data transfer volume for efficiency
🔎 Similar Papers
No similar papers found.
S
Sasmita Mohapatra
High Performance Research Comput ation , ORI, UT -Dallas
Weiming Yang
Weiming Yang
Texas Advanced Computing Center, UT - Austin
Z
Zhengtang Yang
Texas Advanced Computing Center, UT - Austin
C
Chenxiao Wang
Texas Advanced Computing Center, UT - Austin
J
Jinxin Ma
UT - Austin
G
G. Pavlis
Department of Earth and Atmospheric Sciences, Indiana University , Bloomington, IN 47405
Yinzhi Wang
Yinzhi Wang
Texas Advanced Computing Center
SeismologyHigh Performance Computing