AgenTRIM: Tool Risk Mitigation for Agentic AI

📅 2026-01-18

📈 Citations: 2

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the security risks—such as indirect prompt injection—and performance degradation arising from improper tool permission configurations in AI agents, which often manifest as overuse or underuse of tools. The authors propose AgenTRIM, a novel framework that formally characterizes the problem of tool-induced capability imbalance in agents for the first time. Without modifying the agent’s internal logic, AgenTRIM enables runtime risk detection and mitigation through offline interface reconstruction verification and online dynamic filtering based on the principle of least privilege. By integrating code and execution trace analysis, state-aware validation, and adaptive call filtering, the approach significantly reduces attack success rates on the AgentDojo benchmark while maintaining high task completion rates, demonstrating strong robustness against both descriptive attacks and explicit security policies.

Technology Category

Application Category

📝 Abstract

AI agents are autonomous systems that combine LLMs with external tools to solve complex tasks. While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents may retain unnecessary permissions (excessive agency) or fail to invoke required tools (insufficient agency), amplifying the attack surface and reducing performance. We introduce AgenTRIM, a framework for detecting and mitigating tool-driven agency risks without altering an agent's internal reasoning. AgenTRIM addresses these risks through complementary offline and online phases. Offline, AgenTRIM reconstructs and verifies the agent's tool interface from code and execution traces. At runtime, it enforces per-step least-privilege tool access through adaptive filtering and status-aware validation of tool calls. Evaluating on the AgentDojo benchmark, AgenTRIM substantially reduces attack success while maintaining high task performance. Additional experiments show robustness to description-based attacks and effective enforcement of explicit safety policies. Together, these results demonstrate that AgenTRIM provides a practical, capability-preserving approach to safer tool use in LLM-based agents.

Problem

Research questions and friction points this paper is trying to address.

tool risk

agentic AI

security

least-privilege

tool misuse

Innovation

Methods, ideas, or system contributions that make the work stand out.

tool risk mitigation

agentic AI

least-privilege access