PRIM: Meta-Learned Bayesian Root Cause Analysis

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Root cause analysis in complex systems faces challenges such as error propagation, unknown causal structures, and high inference costs. This work proposes the PRIM framework, which formulates root cause localization as a Bayesian inference task over a synthetic causal model prior. By marginalizing structural uncertainty, PRIM implicitly captures shifts in data-generating mechanisms between baseline and anomalous conditions without requiring explicit statistical tests or test-time fitting. Integrating meta-learning with Bayesian causal inference, the method introduces a Model-Averaged Causal Estimation (MACE) Transformer neural process that achieves low-latency (17 milliseconds for systems with hundreds of variables) and high-accuracy root cause identification in a zero-shot setting. Experiments demonstrate performance on par with methods assuming known causal graphs, with further gains attainable through lightweight fine-tuning, consistently excelling on both synthetic benchmarks and real-world datasets including PetShop and CausRCA.

📝 Abstract

Root cause analysis (RCA) in complex systems is challenging due to error propagation across multiple variables, the need for structural causal knowledge, and the computational cost of inference at test time. We introduce PRIM (Prior-fitted Root cause Identification with Meta-learning), a causal meta-learning approach that frames RCA as a Bayesian inference task over a synthetic prior of causal models. By marginalising out structural uncertainty, PRIM implicitly identifies changes in the data-generating mechanism between baseline and anomalous periods. In doing so, PRIM infers distributional differences without explicit statistical testing, and implicitly learns causal structure without model fitting at test time. Following the simulation-based meta-learning paradigm of prior-fitted networks, PRIM uses a Model-Averaged Causal Estimation (MACE) transformer neural process that jointly attends over observational and anomalous samples and the causal structure of nodes, enabling zero-shot inference in 17,ms for systems with up to 100 variables. Across synthetic benchmarks and two realistic benchmark datasets, PetShop and CausRCA, PRIM is competitive with methods that are aware of the system's causal graphical structure a priori while outperforming graph-unaware methods on several tasks. Lightweight fine-tuning to specific domains and data dynamics improves performance further.

Problem

Research questions and friction points this paper is trying to address.

Root Cause Analysis

Causal Inference

Structural Causal Knowledge

Computational Cost

Error Propagation

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal meta-learning

Bayesian inference

zero-shot root cause analysis