The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing agent attribution methods are largely confined to failure scenarios and struggle to explain the underlying causes of behavior across arbitrary task outcomes, lacking a generalizable mechanism for reasoning process attribution. This work proposes the first unified attribution framework applicable to any behavioral outcome, enabling fine-grained attribution through a hierarchical localization mechanism. By integrating dynamic temporal likelihood analysis to identify critical interaction steps and introducing a perturbation-based sentence-level analysis technique, the framework precisely traces textual evidence that drives agent behavior. Experiments across diverse agent scenarios demonstrate that the method effectively identifies key historical events and specific utterances influencing decisions, thereby providing theoretical and technical foundations for building accountable and safer agent systems.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering. As these systems become more autonomous and are deployed at scale, understanding why an agent takes a particular action becomes increasingly important for accountability and governance. However, existing research predominantly focuses on \textit{failure attribution} to localize explicit errors in unsuccessful trajectories, which is insufficient for explaining \textbf{the reason behind agent behaviors}. To bridge this gap, we propose a novel framework for \textbf{general agentic attribution}, designed to identify the internal factors driving agent actions regardless of the task outcome. Our framework operates hierarchically to manage the complexity of agent interactions. Specifically, at the \textit{component level}, we employ temporal likelihood dynamics to identify critical interaction steps; then at the \textit{sentence level}, we refine this localization using perturbation-based analysis to isolate the specific textual evidence. We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias. Experimental results demonstrate that the proposed framework reliably pinpoints pivotal historical events and sentences behind the agent behavior, offering a critical step toward safer and more accountable agentic systems. Codes are available at https://github.com/AI45Lab/AgentDoG.
Problem

Research questions and friction points this paper is trying to address.

agentic attribution
internal drivers
agent behavior
reasoning explanation
accountability
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic attribution
temporal likelihood dynamics
perturbation-based analysis
LLM-based agents
internal driver identification
🔎 Similar Papers
No similar papers found.
C
Chen Qian
Shanghai Artificial Intelligence Laboratory
P
Peng Wang
Shanghai Artificial Intelligence Laboratory
D
Dongrui Liu
Shanghai Artificial Intelligence Laboratory
J
Junyao Yang
Shanghai Artificial Intelligence Laboratory
D
Dadi Guo
Shanghai Artificial Intelligence Laboratory
L
Ling Tang
Shanghai Artificial Intelligence Laboratory
Jilin Mei
Jilin Mei
Research Center for Intelligent Computing Systems, Institute of Computing Technology, University of Chinese Academy of Sciences
autonomous driving
Qihan Ren
Qihan Ren
Shanghai Jiao Tong University
Explainable AIMachine LearningComputer VisionNatural Language Processing
S
Shuai Shao
Shanghai Artificial Intelligence Laboratory
Y
Yong Liu
Renmin University of China
Jie Fu
Jie Fu
Shanghai AI Lab
Deep LearningFormal ReasoningAutoformalizationReinforcement LearningLLMs
Jing Shao
Jing Shao
Research Scientist, Shanghai AI Laboratory/Shanghai Jiao Tong University
Computer VisionMulti-Modal Large Language Model
Xia Hu
Xia Hu
Google DeepMind
Deep LearningMachine LearningMultimodal