XAgen: An Explainability Tool for Identifying and Correcting Failures in Multi-Agent Workflows

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the explainability bottleneck in multi-agent workflows—characterized by difficulties in observing, attributing, and repairing failures—this paper introduces the first integrated debugging framework combining interactive log visualization, human feedback loops, and LLM-as-a-judge automated error detection. It establishes a human-centered paradigm for multi-agent debugging. The framework enables users with diverse technical backgrounds to track execution traces in real time, collaboratively annotate anomalies, and leverage large language models for fine-grained error classification and root-cause attribution. A user study demonstrates that the tool significantly improves fault localization efficiency (reducing time by 57% on average) and attribution accuracy (+39%). Moreover, it supports iterative workflow configuration refinement grounded in human feedback. Empirical evaluation on real-world multi-agent workflows confirms both practical utility and generalizability.

Technology Category

Application Category

📝 Abstract
As multi-agent systems powered by Large Language Models (LLMs) are increasingly adopted in real-world workflows, users with diverse technical backgrounds are now building and refining their own agentic processes. However, these systems can fail in opaque ways, making it difficult for users to observe, understand, and correct errors. We conducted formative interviews with 12 practitioners to identify mismatches between existing observability tools and users' needs. Based on these insights, we designed XAgen, an explainability tool that supports users with varying AI expertise through three core capabilities: log visualization for glanceable workflow understanding, human-in-the-loop feedback to capture expert judgment, and automatic error detection via an LLM-as-a-judge. In a user study with 8 participants, XAgen helped users more easily locate failures, attribute to specific agents or steps, and iteratively improve configurations. Our findings surface human-centered design guidelines for explainable agentic AI development and highlights opportunities for more context-aware interactive debugging.
Problem

Research questions and friction points this paper is trying to address.

Identifies opaque failures in multi-agent workflows
Corrects errors through human-in-the-loop feedback
Improves workflow configurations via explainable AI tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

Log visualization for glanceable workflow understanding
Human-in-the-loop feedback to capture expert judgment
Automatic error detection via an LLM-as-a-judge
🔎 Similar Papers
No similar papers found.