Probability of Root Cause: A Counterfactual Definition and Its Identification

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This study addresses the lack of formal definitions in existing root cause analysis methods, which are often limited to root nodes in causal graphs or biased toward proximate causes. Within the potential outcomes framework, this work proposes the first counterfactual definition of root cause at the individual level and introduces a probabilistic measure—Probability of Root Condition (PRC)—to quantify the likelihood that a candidate set of variables constitutes a root cause for a specific outcome. Under standard causal assumptions, the authors derive an explicit identification formula for PRC by integrating causal mediation analysis with counterfactual reasoning, thereby establishing its identifiability. The effectiveness and practical utility of the proposed approach are demonstrated through two numerical examples, filling a critical gap in the formal theory of root cause analysis.
📝 Abstract
Attributing an observed outcome to its root cause is a central task in domains ranging from medical diagnosis to engineering fault diagnosis. Existing approaches either equate the root cause with a root node of the causal graph, as in causal-discovery-based root cause analysis, or target causes more broadly and thereby favour proximate ones, as with the probability of causation and posterior causal effects. We argue that this issue stems from the absence of a formal definition of a root cause, which has led to methods designed for other purposes being applied to root cause attribution by default. We address this by giving a formal, individual-level definition of a root cause within the potential outcomes framework, based on the notion of an individual cause and a counterfactual root condition motivated by mediation analysis. Building on this definition, we propose the probability of root cause (PRC), which quantifies how probable it is that a candidate variable set is the root cause of a given outcome, conditional on observed evidence. Under standard assumptions, we establish the identifiability of the PRC and derive an explicit identification formula. Two numerical examples illustrate the approach.
Problem

Research questions and friction points this paper is trying to address.

root cause
causal inference
counterfactual
probability of causation
potential outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

root cause
counterfactual
potential outcomes
identifiability
causal mediation