🤖 AI Summary
This study addresses a critical gap in existing research, which has predominantly focused on failures at the LLM agent level while overlooking underlying framework deficiencies. We present the first systematic empirical analysis of 998 defect reports from CrewAI and LangChain, constructing a comprehensive defect taxonomy spanning five phases of the agent lifecycle. Our analysis uncovers 15 root causes and 7 observable symptoms, revealing that API misuse, incompatibilities, and documentation desynchronization are predominant issues—particularly concentrated in the “autonomous action” phase. These defects commonly manifest as functional errors, crashes, and build failures, severely disrupting task execution and control flow. This work provides foundational insights to guide reliability improvements in LLM agent frameworks.
📝 Abstract
LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the underlying agent frameworks becomes critical. However, existing work mainly focuses on agent-level failures, overlooking framework-level bugs. To address this gap, we conduct an empirical study of 998 bug reports from CrewAI and LangChain, constructing a taxonomy of 15 root causes and 7 observable symptoms across five agent lifecycle stages: 'Agent Initialization','Perception', 'Self-Action', 'Mutual Interaction' and 'Evolution'. Our findings show that agent framework bugs mainly arise from 'API misuse', 'API incompatibility', and 'Documentation Desync', largely concentrated in the 'Self-Action' stage. Symptoms typically appear as 'Functional Error', 'Crash', and 'Build Failure', reflecting disruptions to task progression and control flow.