TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional root cause analysis (RCA) in cloud-native distributed systems suffers from heavy manual effort, while existing large language model (LLM)-based approaches struggle to model dynamic service dependencies and integrate heterogeneous, multi-source observability data (metrics, logs, traces). Method: This paper proposes a tool-augmented multimodal LLM agent architecture. It achieves temporally aligned fusion of multi-source observability data via time-series alignment representation, integrates dedicated root-cause localization tools and fault classifiers, and employs dependency-aware prompt engineering to enable precise root-cause identification and context-driven remediation strategy generation under dynamic service topologies. Results: Extensive experiments on multiple heterogeneous public datasets demonstrate significant improvements in root-cause identification accuracy and cross-scenario generalization capability. To the best of our knowledge, this is the first approach to achieve fine-grained, fully automated, structured, and context-consistent end-to-end fault diagnosis and actionable remediation recommendations.

Technology Category

Application Category

📝 Abstract
With the development of distributed systems, microservices and cloud native technologies have become central to modern enterprise software development. Despite bringing significant advantages, these technologies also increase system complexity and operational challenges. Traditional root cause analysis (RCA) struggles to achieve automated fault response, heavily relying on manual intervention. In recent years, large language models (LLMs) have made breakthroughs in contextual inference and domain knowledge integration, providing new solutions for Artificial Intelligence for Operations (AIOps). However, Existing LLM-based approaches face three key challenges: text input constraints, dynamic service dependency hallucinations, and context window limitations. To address these issues, we propose a tool-assisted LLM agent with multi-modality observation data, namely TAMO, for fine-grained RCA. It unifies multi-modal observational data into time-aligned representations to extract consistent features and employs specialized root cause localization and fault classification tools for perceiving the contextual environment. This approach overcomes the limitations of LLM in handling real-time changing service dependencies and raw observational data and guides LLM to generate repair strategies aligned with system contexts by structuring key information into a prompt. Experimental results show that TAMO performs well in root cause analysis when dealing with public datasets characterized by heterogeneity and common fault types, demonstrating its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Automating root cause analysis in complex distributed systems
Overcoming LLM limitations in handling multi-modal data
Reducing manual intervention in fault diagnosis and repair
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies multi-modal data into time-aligned representations
Employs specialized tools for root cause localization
Structures key information into prompts for LLM guidance
🔎 Similar Papers
No similar papers found.
Q
Qi Wang
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
X
Xiao Zhang
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
M
Mingyi Li
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Y
Yuan Yuan
School of Software & Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250000, China
Mengbai Xiao
Mengbai Xiao
Shandong University
F
Fuzhen Zhuang
Institute of Artificial Intelligence, SKLSDE, School of Computer Science, Beihang University, Beijing 100191, China
Dongxiao Yu
Dongxiao Yu
Professor of Computer Science, Shandong University
Distributed ComputingWireless NetworkingGraph Algorithms