Root Cause Analysis of Outliers with Missing Structural Knowledge

📅 2024-06-07
🏛️ arXiv.org
📈 Citations: 4
Influential: 1
📄 PDF
🤖 AI Summary
Real-world root cause analysis (RCA) faces a critical challenge: post-intervention distributions often contain only a few—or even a single—sample, rendering distribution-dependent or low-density-region regression methods statistically ill-posed. This paper proposes a lightweight root cause identification framework that requires neither counterfactual reasoning nor a fully specified structural causal model (SCM). It operates either given a causal DAG or, in the absence of one, solely from an anomaly score ranking. We theoretically prove that low-scoring anomalies rarely trigger high-scoring ones and derive a probabilistic upper bound on non-monotonic propagation paths. By abandoning Shapley-value-based attribution and density-sensitive regression, our method achieves linear time complexity O(n). It eliminates SCM fitting and counterfactual computation while providing rigorous theoretical guarantees and strong empirical performance.

Technology Category

Application Category

📝 Abstract
Recent work conceptualized root cause analysis (RCA) of anomalies via quantitative contribution analysis using causal counterfactuals in structural causal models (SCMs).The framework comes with three practical challenges: (1) it requires the causal directed acyclic graph (DAG), together with an SCM, (2) it is statistically ill-posed since it probes regression models in regions of low probability density, (3) it relies on Shapley values which are computationally expensive to find. In this paper, we propose simplified, efficient methods of root cause analysis when the task is to identify a unique root cause instead of quantitative contribution analysis. Our proposed methods run in linear order of SCM nodes and they require only the causal DAG without counterfactuals. Furthermore, for those use cases where the causal DAG is unknown, we justify the heuristic of identifying root causes as the variables with the highest anomaly score. To this end, we prove that anomalies with small scores are unlikely to cause those with large scores and show upper bounds for the likelihood of causal pathways with non-monotonic anomaly scores.
Problem

Research questions and friction points this paper is trying to address.

Identifies root causes of anomalies with missing causal graph knowledge
Addresses single-sample limitations in post-intervention distribution analysis
Provides guarantees for root cause detection in polytree structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polytree traversal algorithm using marginal anomaly scores
Root cause identification without distribution estimation
Causal justification for highest anomaly score variables
🔎 Similar Papers
No similar papers found.