LoopTrap: Termination Poisoning Attacks on LLM Agents

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the vulnerability of large language model (LLM) agents to malicious prompt injections during iterative execution, which can compromise their ability to determine task termination and induce infinite loops. The authors propose LoopTrap, a novel framework that formally defines “termination pollution” attacks for the first time. LoopTrap adaptively generates targeted adversarial prompts through lightweight behavioral profiling, self-scoring strategy selection, and dynamic prompt injection. The approach uncovers transferable patterns in termination-judgment weaknesses across diverse LLM agents and establishes an automated, scalable red-teaming mechanism. Evaluated on eight mainstream LLM agents across 60 tasks, LoopTrap achieves an average step amplification of 3.57× (up to 25×), significantly enhancing both attack efficiency and generalization capability.

📝 Abstract

Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to determine when a task is complete. In this work, we show that while this self-directed loop facilitates autonomy, it also introduces a critical risk: by injecting malicious prompts into the agent's context, an adversary can distort the agent's termination judgment, making it believe the task remains incomplete and leading to unbounded computation.To understand this threat, we define and systematically characterize it as Termination Poisoning and design 10 representative attack strategies. Through a empirical study spanning 8 LLM agents and 60 tasks, we demonstrate that different LLM agents exhibit distinct behavioral signatures that determine which strategies succeed. These transferable patterns can serve as principled guidance for crafting effective attacks against previously unseen agents and tasks, enabling scalable red-teaming beyond manually designed templates. Building on these insights, we introduce LoopTrap, an automated red-teaming framework that synthesizes target-specific malicious prompts by exploiting agent behavioral tendencies. LoopTrap first constructs a behavioral profile of the target agent along four vulnerability dimensions via lightweight probing. It then performs adaptive trap synthesis, routing to the most effective strategy and selecting optimal injections via a self-scoring mechanism. Finally, successful traps are abstracted into a reusable skill library, while failed attempts are refined through self-reflection, ensuring continuous improvement. Extensive evaluation shows that LoopTrap achieves an average of 3.57$\times$ step amplification across 8 mainstream agents, with a peak of 25$\times$.

Problem

Research questions and friction points this paper is trying to address.

Termination Poisoning

LLM Agents

Red-Teaming

Prompt Injection

Autonomous Loops

Innovation

Methods, ideas, or system contributions that make the work stand out.

Termination Poisoning

LLM Agents

Automated Red-Teaming