A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Seismic fault identification models suffer from poor generalization in real-world exploration due to domain shift, scarce expert annotations, inconsistent evaluation protocols, and fragile fine-tuning behavior. Method: We introduce the first large-scale fault identification benchmark—integrating multi-source real and synthetic datasets (FaultSeg3D, CRACKS, Thebe)—and systematically evaluate pretraining, fine-tuning, and joint training strategies across diverse geological, acquisition, and processing domains. We propose a novel protocol to quantify domain shift, track training dynamics, and enable consistent evaluation, and uncover catastrophic forgetting in fault segmentation for the first time. Contribution/Results: Through 200+ experiments on U-Net variants, we identify key generalization bottlenecks, establish a standardized evaluation framework, deliver practical domain adaptation guidelines for industrial deployment, and publicly release reproducible baselines—advancing the development of more robust and trustworthy seismic interpretation models.

Technology Category

Application Category

📝 Abstract

Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synthetic datasets, the field still lacks a systematic understanding of the generalizability limits of these models across seismic data representing a variety of geologic, acquisition and processing settings. Distributional shifts between different data sources, limitations in fine-tuning strategies and labeled data accessibility, and inconsistent evaluation protocols all represent major roadblocks in the deployment of reliable and robust models in real-world exploration settings. In this paper, we present the first large-scale benchmarking study explicitly designed to provide answers and guidelines for domain shift strategies in seismic interpretation. Our benchmark encompasses over $200$ models trained and evaluated on three heterogeneous datasets (synthetic and real data) including FaultSeg3D, CRACKS, and Thebe. We systematically assess pretraining, fine-tuning, and joint training strategies under varying degrees of domain shift. Our analysis highlights the fragility of current fine-tuning practices, the emergence of catastrophic forgetting, and the challenges of interpreting performance in a systematic manner. We establish a robust experimental baseline to provide insights into the tradeoffs inherent to current fault delineation workflows, and shed light on directions for developing more generalizable, interpretable and effective machine learning models for seismic interpretation. The insights and analyses reported provide a set of guidelines on the deployment of fault delineation models within seismic interpretation workflows.

Problem

Research questions and friction points this paper is trying to address.

Assessing generalizability limits of fault delineation models across diverse seismic data

Addressing distributional shifts and inconsistent evaluation in seismic interpretation workflows

Improving model robustness and interpretability for real-world exploration settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale benchmarking study for seismic interpretation

Assesses pretraining, fine-tuning, and joint training strategies

Highlights fragility of current fine-tuning practices

🔎 Similar Papers

No similar papers found.