Securing LLM Agents Need Intent-to-Execution Integrity

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Current large language model (LLM) agents lack systematic security guarantees throughout the entire pipeline from user intent to execution, rendering them vulnerable to attacks when interacting with untrusted tools and data. This work introduces, for the first time, the concept of “intent-to-execution integrity,” drawing an analogy to compiler architecture to formally define four core integrity properties—tool, instruction, judgment, and dataflow—and constructs an end-to-end security framework grounded in these principles. The study exposes critical limitations in existing defense mechanisms, particularly their non-compositional nature and insufficient coverage, thereby establishing a theoretical foundation and a unified evaluation standard for assessing and building truly trustworthy LLM agents.

📝 Abstract

This position paper argues that securing LLM agents requires first defining an end-to-end correctness property that specifies when an agent's execution faithfully reflects the user's intent. Modern LLM agents operate over an \emph{intent-to-execution pipeline}, where natural-language instructions are translated into concrete system operations such as tool calls, API requests, and code execution. While recent defenses have made progress in constraining how agents construct tool calls, most existing formulations implicitly assume that tools are trusted. The emergence of systems such as OpenClaw, with open ecosystems of third-party skills and direct access to user environments, breaks this assumption and exposes new failure modes, including malicious or over-privileged components in the execution pipeline. Despite rapid progress in defense mechanisms, there is no adequate correctness property that defines what ``secure'' means for LLM agents, nor a principled way to evaluate the coverage of existing defenses. We observe that LLM agents are structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent. Drawing on this analogy, we identify two fundamental problem sources -- untrusted data ingestion and untrusted tool execution -- and derive four integrity properties that must hold simultaneously: \emph{Tool Integrity}, \emph{Instruction Integrity}, \emph{Judgment Integrity}, and \emph{Data Flow Integrity}. We call their conjunction \emph{intent-to-execution integrity}. Analyzing existing agentic defenses against these properties reveals that current systems provide only partial and non-compositional coverage, leaving fundamental gaps in securing modern LLM agents.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

intent-to-execution integrity

security

untrusted tools

correctness property

Innovation

Methods, ideas, or system contributions that make the work stand out.

intent-to-execution integrity

LLM agents

tool integrity