Parallel Seismic Data Processing Performance with Cloud-based Storage

📅 2025-04-12

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address I/O bottlenecks and scalability limitations arising from rapidly growing seismic waveform data volumes in cloud environments, this paper proposes a cloud-native parallel processing framework based on “preemptive data reduction.” The framework adopts a hybrid paradigm—locally driven yet cloud-coordinated—and integrates the MsPASS system into AWS Lambda’s serverless platform to enable lightweight, cloud-based preprocessing (e.g., filtering, trimming, downsampling), thereby drastically reducing data transfer overhead. Data flow is further optimized via coordinated scheduling between cloud object storage and on-premises HPC resources. Experimental evaluation on terabyte-scale seismic datasets demonstrates throughput comparable to that of local HPC file systems. This work constitutes the first empirical validation of cloud-native architectures for real-time, scalable processing of large-scale seismic waveforms, overcoming the performance and capacity constraints inherent in traditional centralized, on-premises processing paradigms.

Technology Category

Application Category

📝 Abstract

This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance compute cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model where processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service's Lamba service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data.

Problem

Research questions and friction points this paper is trying to address.

Optimizing seismic data processing with cloud storage

Reducing data transfer bottlenecks in hybrid systems

Improving performance for large-scale seismic analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid cloud-local seismic data processing

Utilizes AWS Lambda for parallel processing

Reduces data transfer volume for efficiency

🔎 Similar Papers

No similar papers found.