LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) deployed in email assistants are vulnerable to indirect prompt injection attacks, yet existing benchmarks lack realism and comprehensive evaluation. Method: We introduce LLMail-Inject—the first adaptive, real-world email-oriented prompt injection challenge—designed for production-like environments integrating RAG, heterogeneous models (open- and closed-source), and mainstream defenses, with an automated evaluation pipeline. Contribution/Results: Leveraging a large-scale, multi-strategy, multi-model collaborative public challenge paradigm, we systematically expose the intrinsic vulnerability of instruction-data confusion and establish a structured robustness analysis framework. We release a dataset of 208,095 unique attack instances from 839 participants, alongside fully open-sourced code, data, and analytical reports—enabling reproducible, scalable research on prompt injection mitigation.

Technology Category

Application Category

📝 Abstract
Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks can have wide security and privacy implications, and many real-world LLM-based applications remain vulnerable. We present the results of LLMail-Inject, a public challenge simulating a realistic scenario in which participants adaptively attempted to inject malicious instructions into emails in order to trigger unauthorized tool calls in an LLM-based email assistant. The challenge spanned multiple defense strategies, LLM architectures, and retrieval configurations, resulting in a dataset of 208,095 unique attack submissions from 839 participants. We release the challenge code, the full dataset of submissions, and our analysis demonstrating how this data can provide new insights into the instruction-data separation problem. We hope this will serve as a foundation for future research towards practical structural solutions to prompt injection.
Problem

Research questions and friction points this paper is trying to address.

Evaluating defenses against adaptive indirect prompt injection attacks
Assessing vulnerabilities in LLM-based email assistants
Improving instruction-data separation in Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulated realistic adaptive prompt injection attacks
Evaluated multiple defense strategies and LLM architectures
Collected large dataset for instruction-data separation analysis
🔎 Similar Papers
No similar papers found.
Sahar Abdelnabi
Sahar Abdelnabi
AI Security Researcher, Microsoft
AI SecurityAI SafetyAdversarial Machine LearningLLMs
A
Aideen Fay
Microsoft
A
Ahmed Salem
Microsoft
Egor Zverev
Egor Zverev
PhD Candidate, ISTA
Machine LearningTrustworthy Machine LearningAI safety
K
Kai-Chieh Liao
Trend Micro
C
Chi-Huang Liu
Trend Micro
C
Chun-Chih Kuo
Trend Micro
J
Jannis Weigend
Trend Micro
D
Danyael Manlangit
RainaResearch
A
Alex Apostolov
RainaResearch
H
Haris Umair
RainaResearch
J
Joao Donato
RainaResearch, University of Coimbra
M
Masayuki Kawakita
RainaResearch
Athar Mahboob
Athar Mahboob
RainaResearch
T
Tran Huu Bach
Vietnamese German University
T
Tsun-Han Chiang
Trend Micro
M
Myeongjin Cho
SK Shieldus
Hajin Choi
Hajin Choi
SK Shieldus
B
Byeonghyeon Kim
SK Shieldus
H
Hyeonjin Lee
SK Shieldus
B
Benjamin Pannell
Microsoft
C
Conor McCauley
HiddenLayer
Mark Russinovich
Mark Russinovich
Microsoft Azure CTO, Deputy CISO, Technical Fellow
CloudAIprivacycybersecurityblockchain
Andrew Paverd
Andrew Paverd
Microsoft
SecurityPrivacy
Giovanni Cherubin
Giovanni Cherubin
Microsoft
Machine LearningConformal PredictionPrivacyInformation Leakage