TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the challenge of fault diagnosis and root cause analysis in complex multi-agent systems, where execution traces are often lengthy and structurally intricate, leading existing methods to either overlook critical behavioral details or focus solely on final outcomes. To overcome these limitations, the authors propose TraceSIR, a novel multi-agent framework that introduces TraceFormat—a structured abstraction for system trajectories—and integrates three specialized agents: StructureAgent for trajectory compression, InsightAgent for fine-grained root cause diagnosis, and ReportAgent for generating cross-task actionable reports. Experimental evaluation on the newly curated TraceBench benchmark demonstrates that TraceSIR significantly outperforms existing approaches in report coherence, informativeness, and actionability, thereby fulfilling the diagnostic requirements of real-world scenarios.

Technology Category

Application Category

📝 Abstract

Agentic systems augment large language models with external tools and iterative decision making, enabling complex tasks such as deep research, function calling, and coding. However, their long and intricate execution traces make failure diagnosis and root cause analysis extremely challenging. Manual inspection does not scale, while directly applying LLMs to raw traces is hindered by input length limits and unreliable reasoning. Focusing solely on final task outcomes further discards critical behavioral information required for accurate issue localization. To address these issues, we propose TraceSIR, a multi-agent framework for structured analysis and reporting of agentic execution traces. TraceSIR coordinates three specialized agents: (1) StructureAgent, which introduces a novel abstraction format, TraceFormat, to compress execution traces while preserving essential behavioral information; (2) InsightAgent, which performs fine-grained diagnosis including issue localization, root cause analysis, and optimization suggestions; (3) ReportAgent, which aggregates insights across task instances and generates comprehensive analysis reports. To evaluate TraceSIR, we construct TraceBench, covering three real-world agentic scenarios, and introduce ReportEval, an evaluation protocol for assessing the quality and usability of analysis reports aligned with industry needs. Experiments show that TraceSIR consistently produces coherent, informative, and actionable reports, significantly outperforming existing approaches across all evaluation dimensions. Our project and video are publicly available at https://github.com/SHU-XUN/TraceSIR.

Problem

Research questions and friction points this paper is trying to address.

agentic execution traces

failure diagnosis

root cause analysis

trace analysis

behavioral information

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

execution trace analysis

structured abstraction