Defeating Prompt Injections by Design

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Large language model (LLM)-based agents are vulnerable to prompt injection attacks when processing untrusted inputs. To address this, we propose CaMeL, a defense framework that introduces the first explicit separation of control flow and data flow: program logic (control flow) derived from trusted queries is strictly isolated from external inputs (data flow). This separation is reinforced by capability-driven private data access control and a secure sandbox architecture, enabling formally verifiable safety guarantees. Evaluated on the AgentDojo benchmark (NeurIPS 2024), CaMeL achieves a 67% task completion rate while maintaining end-to-end formal security—outperforming all existing defenses in both robustness and provable safety.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL relies on a notion of a capability to prevent the exfiltration of private data over unauthorized data flows. We demonstrate effectiveness of CaMeL by solving $67%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.

Problem

Research questions and friction points this paper is trying to address.

Prevent prompt injection attacks in LLM agents

Secure LLMs from untrusted data influence

Block unauthorized private data exfiltration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Creates protective system layer around LLM

Explicitly extracts control and data flows

Uses capability to prevent private data exfiltration

🔎 Similar Papers

No similar papers found.