TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LLM agents deployed in real-world applications face adversarial threats—including tool poisoning and malicious instructions—that can compromise execution integrity, cause data leakage, and incur financial loss. Conventional defense mechanisms rely on manually crafted, predefined rules, suffering from limited coverage and high false-negative rates. This paper proposes TraceAegis, a rule-agnostic, hierarchical behavioral modeling and provenance analysis framework. It reconstructs multi-granularity behavioral units by tracing agent execution trajectories, automatically induces constraint-based behavioral rules, and jointly verifies behavioral ordering and semantic consistency for multi-dimensional anomaly detection. Evaluated on our curated benchmark TraceAegis-Bench, TraceAegis significantly improves detection of previously unseen attacks, achieving markedly higher accuracy than state-of-the-art baselines. It demonstrates exceptional robustness against workflow manipulation and semantic inconsistency attacks, enabling secure deployment in high-stakes domains such as healthcare and enterprise procurement.

Technology Category

Application Category

📝 Abstract
LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent's execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors.
Problem

Research questions and friction points this paper is trying to address.

Detecting tool poisoning and malicious instruction attacks on LLM agents
Addressing limitations of manual rule-based anomaly detection systems
Securing agent execution flow through hierarchical behavioral analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Provenance-based analysis framework using execution traces
Hierarchical structure abstracts stable execution units
Behavioral rules validate execution traces for anomalies
🔎 Similar Papers