🤖 AI Summary
Real-world root cause analysis (RCA) faces a critical challenge: post-intervention distributions often contain only a few—or even a single—sample, rendering distribution-dependent or low-density-region regression methods statistically ill-posed. This paper proposes a lightweight root cause identification framework that requires neither counterfactual reasoning nor a fully specified structural causal model (SCM). It operates either given a causal DAG or, in the absence of one, solely from an anomaly score ranking. We theoretically prove that low-scoring anomalies rarely trigger high-scoring ones and derive a probabilistic upper bound on non-monotonic propagation paths. By abandoning Shapley-value-based attribution and density-sensitive regression, our method achieves linear time complexity O(n). It eliminates SCM fitting and counterfactual computation while providing rigorous theoretical guarantees and strong empirical performance.
📝 Abstract
Recent work conceptualized root cause analysis (RCA) of anomalies via quantitative contribution analysis using causal counterfactuals in structural causal models (SCMs).The framework comes with three practical challenges: (1) it requires the causal directed acyclic graph (DAG), together with an SCM, (2) it is statistically ill-posed since it probes regression models in regions of low probability density, (3) it relies on Shapley values which are computationally expensive to find. In this paper, we propose simplified, efficient methods of root cause analysis when the task is to identify a unique root cause instead of quantitative contribution analysis. Our proposed methods run in linear order of SCM nodes and they require only the causal DAG without counterfactuals. Furthermore, for those use cases where the causal DAG is unknown, we justify the heuristic of identifying root causes as the variables with the highest anomaly score. To this end, we prove that anomalies with small scores are unlikely to cause those with large scores and show upper bounds for the likelihood of causal pathways with non-monotonic anomaly scores.