A Goal-Driven Survey on Root Cause Analysis

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing root cause analysis (RCA) research lacks a goal-oriented, systematic taxonomy, leading to task ambiguity and hindered progress assessment. Method: This paper proposes the first RCA classification framework centered on fundamental objectives—departing from conventional data-type–based taxonomies—and systematically categorizes 135 studies (2014–2025) according to core goals such as fault localization and defect remediation. Guided by a systematic literature review, we construct a multi-level RCA objective hierarchy that characterizes the state of the art, recurrent challenges, and critical technical gaps per task. Contribution/Results: We present the first RCA objective-method mapping atlas tailored to cloud service scenarios, establishing a theoretical foundation for academic research and a practical technology roadmap for industrial deployment.

Technology Category

Application Category

📝 Abstract
Root Cause Analysis (RCA) is a crucial aspect of incident management in large-scale cloud services. While the term root cause analysis or RCA has been widely used, different studies formulate the task differently. This is because the term "RCA" implicitly covers tasks with distinct underlying goals. For instance, the goal of localizing a faulty service for rapid triage is fundamentally different from identifying a specific functional bug for a definitive fix. However, previous surveys have largely overlooked these goal-based distinctions, conventionally categorizing papers by input data types (e.g., metric-based vs. trace-based methods). This leads to the grouping of works with disparate objectives, thereby obscuring the true progress and gaps in the field. Meanwhile, the typical audience of an RCA survey is either laymen who want to know the goals and big picture of the task or RCA researchers who want to figure out past research under the same task formulation. Thus, an RCA survey that organizes the related papers according to their goals is in high demand. To this end, this paper presents a goal-driven framework that effectively categorizes and integrates 135 papers on RCA in the context of cloud incident management based on their diverse goals, spanning the period from 2014 to 2025. In addition to the goal-driven categorization, it discusses the ultimate goal of all RCA papers as an umbrella covering different RCA formulations. Moreover, the paper discusses open challenges and future directions in RCA.
Problem

Research questions and friction points this paper is trying to address.

Classifying root cause analysis papers by goals instead of data types
Addressing diverse RCA objectives in cloud incident management systems
Providing a goal-driven framework to organize 135 RCA research papers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Goal-driven framework categorizing RCA papers
Organizes studies by diverse objectives not data
Integrates 135 cloud incident management papers
🔎 Similar Papers
No similar papers found.