Breaking the Black Box: Inherently Interpretable Physics-Informed Machine Learning for Imbalanced Seismic Data

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Traditional machine learning-based ground motion models (GMMs) suffer from intrinsic opacity (“black-box” nature) and degraded performance near fault ruptures due to severe class imbalance—stemming from scarcity of near-fault strong-motion records. To address these challenges, this work proposes a physics-informed, inherently interpretable machine learning framework. The method employs a linear decomposition architecture that explicitly separates source, path, and site effects, ensuring full transparency in prediction. It introduces HazBinLoss, a novel adaptive loss function that prioritizes high-damage near-fault recordings during training. Additionally, domain-specific physical constraints are embedded as soft priors to enforce scientific consistency. Experimental results demonstrate that the model achieves predictive accuracy comparable to state-of-the-art GMMs, while significantly improving near-field strong-motion estimation—particularly for large-magnitude events. Crucially, it provides rigorous interpretability grounded in seismological principles and maintains engineering credibility, making it suitable for high-stakes seismic hazard assessment and disaster mitigation decision-making.

Technology Category

Application Category

📝 Abstract

Ground motion models (GMMs) predict how strongly the ground will shake during an earthquake. They are essential for structural analysis, seismic design, and seismic risk assessment studies. Traditional machine learning (ML) approaches are popular to develop GMMs, due to large earthquake databases worldwide. However, they operate as "black boxes," which are hard to interpret and trust, limiting their use in high-stake decisions. Additionally, these databases suffer from significant data imbalances: fewer large, critically damaging records near the fault compared to abundant, less severely damaging distant records. These two limitations are addressed in this work by developing a transparent ML architecture using the HazBinLoss function. Each input (e.g., magnitude, distance, their interaction term, etc.) is processed separately and added linearly to obtain the output, resulting in exact contribution of each term. The HazBinLoss function assigns higher weights to critical near-field large magnitude records and lower weights to less-critical far-field smaller magnitude records, during training to prevent underprediction of the most damaging scenarios. Our model captures known seismological principles and achieves comparable performance with established GMMs while maintaining transparency. This framework enables broader adoption of ML-based approaches for risk assessment studies and disaster planning.

Problem

Research questions and friction points this paper is trying to address.

Develops transparent machine learning for seismic ground motion prediction

Addresses data imbalance in earthquake records for accurate modeling

Ensures interpretability while maintaining performance in risk assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transparent ML architecture with linear input contributions

HazBinLoss function weighting critical seismic records

Physics-informed model maintaining interpretability and performance

🔎 Similar Papers

One Wave to Explain Them All: A Unifying Perspective on Post-hoc Explainability