🤖 AI Summary
Hidden prompt injection attacks in structured documents (e.g., resumes, academic papers) pose a critical threat to LLM-based systems, particularly when malicious prompts are embedded visually—without explicit textual markers—in PDF/HTML formats.
Method: This paper introduces the first systematic detection framework for such attacks across document formats, integrating lightweight static analysis with context-aware semantic detection. We implement PhantomLint, a prototype tool designed for practical deployment.
Contribution/Results: Our key innovations include the first multimodal modeling of visually concealed prompts and a cross-format robust detection strategy that significantly reduces false positives. Evaluated on 3,402 real-world documents—including preprints, resumes, and scholarly articles—PhantomLint achieves an average false positive rate of only 0.092%, demonstrating high accuracy, low computational overhead, and strong generalization across diverse document types and layouts. The approach provides a deployable, trustworthy safeguard for AI-augmented decision-making systems.
📝 Abstract
Hidden LLM prompts have appeared in online documents with increasing frequency. Their goal is to trigger indirect prompt injection attacks while remaining undetected from human oversight, to manipulate LLM-powered automated document processing systems, against applications as diverse as résumé screeners through to academic peer review processes. Detecting hidden LLM prompts is therefore important for ensuring trust in AI-assisted human decision making.
This paper presents the first principled approach to hidden LLM prompt detection in structured documents. We implement our approach in a prototype tool called PhantomLint. We evaluate PhantomLint against a corpus of 3,402 documents, including both PDF and HTML documents, and covering academic paper preprints, CVs, theses and more. We find that our approach is generally applicable against a wide range of methods for hiding LLM prompts from visual inspection, has a very low false positive rate (approx. 0.092%), is practically useful for detecting hidden LLM prompts in real documents, while achieving acceptable performance.