🤖 AI Summary
This work addresses the challenge of root cause identification in multi-stream systems where individual streams may undergo distributional shifts at unknown time points. The authors propose the Conformal Root Cause Analysis (CROC) framework, which, for the first time, enables general root cause localization without requiring any distributional assumptions—relying solely on independence across streams and exchangeability within each stream before and after its change point. CROC constructs finite-sample valid confidence sets for the root cause by leveraging conformal p-values, accommodates any distribution-free change detection procedure, and achieves asymptotic sharpness under mild conditions. The framework is further extended to handle inter-stream dependencies. Empirical results demonstrate that CROC accurately identifies the true root cause, with confidence set coverage aligning with theoretical guarantees, while maintaining high localization power even in the presence of cross-stream dependencies.
📝 Abstract
We study distribution-free root cause analysis in multi-stream data, where an evolving underlying system is observed through multiple data streams that may each undergo distributional changes at unknown timepoints. In such settings, the stream exhibiting the earliest change provides a natural starting point for investigating the underlying cause, which we refer to as the root-cause index. Leveraging conformal $p$-values, we propose a novel framework, Conformal Root Cause Analysis (CROC), which constructs finite-sample valid confidence sets for the root-cause index under minimal assumptions: the data streams are independent, and within each stream the pre- and post-change observations are sampled exchangeably from arbitrary and unknown distributions. We further establish a universality property, showing that any distribution-free method for root cause localization can be represented within the CROC framework. In addition, under mild regularity conditions and principled score design, our method yields asymptotically sharp confidence sets that efficiently isolate the root cause. We further extend CROC to efficiently handle cross-stream dependence when present. Extensive simulations demonstrate accurate localization of the root stream, supporting our theoretical guarantees.