PyAWD: A Library for Generating Large Synthetic Datasets of Acoustic Wave Propagation with Devito

📅 2024-11-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Seismic data are inherently sparse and spatially heterogeneous, severely limiting machine learning applications in seismology; moreover, no open-source generative tools exist for large-scale, high-fidelity ground-motion simulation. To address this, we propose PyAWD—a Devito-based open-source Python library that introduces the first configurable, reproducible, and high-fidelity synthetic waveform generation framework tailored for machine learning. PyAWD supports 2D/3D heterogeneous media, fine-grained source mechanisms, and customizable acquisition parameters, and pioneers a novel data-budget analysis paradigm. Leveraging finite-difference time-domain modeling of the acoustic wave equation, PyAWD efficiently generates terabyte-scale synthetic datasets. In epicenter localization tasks, it significantly improves model accuracy under low-data regimes and enables quantitative trade-off analysis between dataset scale and predictive performance.

Technology Category

Application Category

📝 Abstract
Seismic data is often sparse and unevenly distributed due to the high costs and logistical challenges associated with deploying physical seismometers, limiting the application of Machine Learning (ML) in earthquake analysis. While simulation methods exist, no tool allows the generation of large datasets containing simulated measurements of the ground motion. To address this gap, we introduce PyAWD, a Python library designed to generate high-resolution synthetic datasets simulating spatio-temporal acoustic wave propagation in both two-dimensional and three-dimensional heterogeneous media. By allowing fine control over parameters such as the wave speed, external forces, spatial and temporal discretization, and media composition, PyAWD enables the creation of ML-scale datasets that capture the complexity of seismic wave behavior. We illustrate the library's potential with an epicenter retrieval task, showcasing its suitability for designing complex, accurate seismic problems that require advanced ML approaches in the absence or lack of dense real-world data. We also show the usefulness of our tool to tackle the problem of data budgeting in the framework of epicenter retrieval.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic datasets for seismic wave propagation
Addresses sparse real-world seismic data for ML applications
Enables customizable wave simulation parameters for ML-scale datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic acoustic wave datasets
Controls wave speed and media composition
Supports 2D and 3D heterogeneous media
🔎 Similar Papers
No similar papers found.