🤖 AI Summary
Existing agent attribution methods are largely confined to failure scenarios and struggle to explain the underlying causes of behavior across arbitrary task outcomes, lacking a generalizable mechanism for reasoning process attribution. This work proposes the first unified attribution framework applicable to any behavioral outcome, enabling fine-grained attribution through a hierarchical localization mechanism. By integrating dynamic temporal likelihood analysis to identify critical interaction steps and introducing a perturbation-based sentence-level analysis technique, the framework precisely traces textual evidence that drives agent behavior. Experiments across diverse agent scenarios demonstrate that the method effectively identifies key historical events and specific utterances influencing decisions, thereby providing theoretical and technical foundations for building accountable and safer agent systems.
📝 Abstract
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering. As these systems become more autonomous and are deployed at scale, understanding why an agent takes a particular action becomes increasingly important for accountability and governance. However, existing research predominantly focuses on \textit{failure attribution} to localize explicit errors in unsuccessful trajectories, which is insufficient for explaining \textbf{the reason behind agent behaviors}. To bridge this gap, we propose a novel framework for \textbf{general agentic attribution}, designed to identify the internal factors driving agent actions regardless of the task outcome. Our framework operates hierarchically to manage the complexity of agent interactions. Specifically, at the \textit{component level}, we employ temporal likelihood dynamics to identify critical interaction steps; then at the \textit{sentence level}, we refine this localization using perturbation-based analysis to isolate the specific textual evidence. We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias. Experimental results demonstrate that the proposed framework reliably pinpoints pivotal historical events and sentences behind the agent behavior, offering a critical step toward safer and more accountable agentic systems. Codes are available at https://github.com/AI45Lab/AgentDoG.