Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Current AI agent systems lack a systematic understanding of the origins, manifestations, and propagation mechanisms of failures, limiting their reliability. This study addresses this gap through a large-scale empirical analysis of 13,602 issues across 40 open-source projects, applying grounded theory coding to 385 representative failures. We propose the first three-layer fault taxonomy specifically for AI agents, identifying 37 fault types, 13 symptom categories, and 12 root cause categories. Our analysis reveals a core failure mechanism: the mismatch between probabilistic generative outputs and deterministic interface constraints. Validated through Apriori-based association rule mining and a survey of 145 developers, the taxonomy received high recognition (average 3.97/5), with 83.8% of respondents confirming its coverage of real-world faults, effectively capturing typical propagation paths such as token management errors leading to authentication failures.

Technology Category

Application Category

📝 Abstract

Agentic AI systems combine large language model (LLM) reasoning with external tool invocation and long-horizon task execution. Although these systems are increasingly deployed in practice, their architectural composition introduces reliability challenges that differ from those in traditional software systems and standalone LLM applications. However, there is limited empirical understanding of how faults originate, manifest, and propagate in real-world agentic AI systems. To address this gap, we conduct a large-scale empirical study of faults in agentic AI systems. We collect 13,602 issues and pull requests from 40 open-source agentic AI repositories and apply stratified sampling to select 385 faults for in-depth qualitative analysis. Using grounded theory, we derive taxonomies of fault types, observable symptoms, and root causes. We further apply Apriori-based association rule mining to identify statistically significant relationships among faults, symptoms, and root causes, revealing common fault propagation patterns. Finally, we validate the taxonomy through a developer study with 145 practitioners. Our analysis identifies 37 distinct fault types grouped into 13 higher-level fault categories, along with 13 classes of observable symptoms and 12 categories of root causes. The results show that many failures originate from mismatches between probabilistically generated artifacts and deterministic interface constraints, frequently involving dependency integration, data validation, and runtime environment handling. Association rule mining further reveals recurring propagation pathways across system components, such as token management faults leading to authentication failures and datetime handling defects causing scheduling anomalies. Practitioners rated the taxonomy as representative of real-world failures (mean = 3.97/5), and 83.8% reported that it covered faults they had encountered.

Problem

Research questions and friction points this paper is trying to address.

Agentic AI

Fault characterization

Taxonomy

Root cause analysis

System reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI

fault taxonomy

association rule mining