🤖 AI Summary
This work addresses the lack of robustness in feature importance rankings produced by different explainable AI (XAI) methods—such as LIME and SHAP—for the same decision in oil and gas exploration risk assessment. To tackle this issue, the study proposes the first unified evaluation framework grounded in causal theory, which systematically assesses the consistency of XAI explanations on high-dimensional structured geological data by generating counterfactual samples and quantifying each feature’s necessity and sufficiency in a causal sense. The framework not only reveals divergent stability behaviors of LIME and SHAP under noisy or anomalous data conditions but also provides principled guidance for optimal model–explainer pairing, thereby significantly enhancing the credibility and theoretical foundation of hydrocarbon indicator interpretations.
📝 Abstract
In geophysics, hydrocarbon prospect risking involves assessing the risks associated with hydrocarbon exploration by integrating data from various sources. Machine learning-based classifiers trained on tabular data have been recently used to make faster decisions on these prospects. The lack of transparency in the decision-making processes of such models has led to the emergence of explainable AI. Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) are two such examples of these explainability methods which aim to generate insights about a particular decision by ranking the input features in terms of importance. However, results of the same scenario generated by these two different explanation approaches have been shown to disagree or diverge, particularly for complex data. This discrepancy arises because the concepts of “importance” and “relevance” are defined differently across these approaches. Thus, grounding these ranked features using theoretically backed causal notions of necessity and sufficiency can serve as a more reliable and robust way to enhance the trustworthiness of these methodologies. We propose a unified framework to generate counterfactuals, quantify necessity and sufficiency, and use these measures to perform a robustness evaluation of the insights provided by LIME and SHAP on high-dimensional structured prospect risking data. This robustness test yields deeper insights into the models capabilities to handle erroneous data and reveals which explainability module pairs most effectively with which model for our dataset for hydrocarbon indicators.