Causal-discovery-based root-cause analysis and its application in time-series prediction error diagnosis

📅 2024-11-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Explaining prediction errors—particularly outliers—in machine learning models, especially black-box models, remains challenging due to difficulties in root-cause attribution and the lack of causal interpretability. Method: This paper proposes a causal-discovery-driven root-cause analysis method that does not require a predefined causal graph. It is the first to integrate data-driven causal discovery algorithms with Shapley value-based attribution, enabling causal-level quantification of each input variable’s contribution to time-series prediction errors. Contribution/Results: Evaluated through synthetic error simulations and multiple real-world time-series diagnostic experiments, the method significantly outperforms mainstream heuristic attribution approaches. It achieves higher accuracy in identifying critical error-inducing variables, improved diagnostic reliability, and greater industrial applicability—overcoming key limitations of prior methods relying either on domain-knowledge-based causal graphs or non-causal heuristics.

Technology Category

Application Category

📝 Abstract
Recent rapid advancements of machine learning have greatly enhanced the accuracy of prediction models, but most models remain"black boxes", making prediction error diagnosis challenging, especially with outliers. This lack of transparency hinders trust and reliability in industrial applications. Heuristic attribution methods, while helpful, often fail to capture true causal relationships, leading to inaccurate error attributions. Various root-cause analysis methods have been developed using Shapley values, yet they typically require predefined causal graphs, limiting their applicability for prediction errors in machine learning models. To address these limitations, we introduce the Causal-Discovery-based Root-Cause Analysis (CD-RCA) method that estimates causal relationships between the prediction error and the explanatory variables, without needing a pre-defined causal graph. By simulating synthetic error data, CD-RCA can identify variable contributions to outliers in prediction errors by Shapley values. Extensive experiments show CD-RCA outperforms current heuristic attribution methods.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing prediction errors in black-box machine learning models
Identifying causal relationships without predefined causal graphs
Improving error attribution accuracy for outliers in time-series
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal-discovery-based root-cause analysis without predefined graphs
Estimates causal relationships using synthetic error data
Identifies variable contributions via Shapley values
🔎 Similar Papers
No similar papers found.
H
Hiroshi Yokoyama
Faculty of Data Science, Shiga University / Division of Neural Dynamics, National Institute for Physiological Sciences / RIKEN AIP
R
Ryusei Shingaki
System AI Laboratory, Corporate Research & Development Center, Toshiba Corporation
K
Kaneharu Nishino
System AI Laboratory, Corporate Research & Development Center, Toshiba Corporation
S
Shohei Shimizu
Faculty of Data Science, Shiga University / RIKEN AIP
Thong Pham
Thong Pham
Associate Professor, University of South Australia
PrefabricationBlast and Impact EngineeringProtective StructuresFRPSustainable Materials