🤖 AI Summary
This study addresses the underexplored real-world prevalence and evolution of prompt injection attacks against large language models (LLMs), which remain poorly characterized despite their known vulnerability. Focusing on the resume-screening scenario, the work presents the first large-scale empirical analysis based on approximately 200,000 real-world resumes. The authors propose a tailored detection algorithm integrated with human verification and temporal modeling to systematically assess the presence of prompt injections in practice. Their findings reveal that roughly 1% of resumes contain hidden prompt injections, over 90% of which employ non-explicit instructional forms rather than overt commands. Moreover, the proportion of such attacks has risen significantly over the past two years, highlighting both the increasing sophistication and growing incidence of these stealthy adversarial manipulations in real-world LLM deployments.
📝 Abstract
LLMs are vulnerable to prompt injection attacks. However, this vulnerability has been primarily demonstrated conceptually in academic studies or through a few anecdotal case studies. Its prevalence and impact in real-world LLM-based applications are largely unexplored. In this work, we present the first systematic study of prompt-injection attacks in a widely used application: LLM-based resume screening. Our analysis is based on approximately 200K real-world resumes collected over multiple years by hireEZ. We first design tailored methods to detect prompt injection in resumes. Manual validation on a small-scale dataset demonstrates that our detectors achieve high precision and outperform state-of-the-art general-purpose detectors. We then apply our detector to the full resume dataset and conduct a comprehensive measurement study of real-world prompt injection attacks. Our analysis reveals several intriguing findings: approximately 1% of resumes contain hidden prompt injections; the prevalence of such injected resumes has increased noticeably over the past one to two years; and more than 90% of injected prompts do not use explicit instructions. These results provide the first evidence of large-scale prompt injection in real-world LLM-based applications and lay the groundwork for future studies to understand and mitigate such attacks.