Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enterprise large language models (LLMs) face multi-round prompt-injection attacks, where adversaries gradually exfiltrate sensitive data via seemingly benign prompts across multiple interaction rounds, evading conventional single-round defenses. To address this, we propose the first formal threat model for multi-round prompt attacks, grounded in information-theoretic analysis to derive fundamental bounds on information leakage. We introduce “Spotlight”, a novel defense mechanism that dynamically isolates and sanitizes low-credibility prompts via content-aware trust scoring, enabling fine-grained architectural access control and enhanced differential privacy guarantees. Spotlight integrates statistical anomaly detection with optimization-driven defense policies. Extensive experiments demonstrate that our defense-in-depth framework reduces attack success rates by 87.3% and achieves an AUC of 0.942—substantially outperforming baseline methods—and provides formally verifiable security for enterprise LLM deployment.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) deployed in enterprise settings (e.g., as Microsoft 365 Copilot) face novel security challenges. One critical threat is prompt inference attacks: adversaries chain together seemingly benign prompts to gradually extract confidential data. In this paper, we present a comprehensive study of multi-stage prompt inference attacks in an enterprise LLM context. We simulate realistic attack scenarios where an attacker uses mild-mannered queries and indirect prompt injections to exploit an LLM integrated with private corporate data. We develop a formal threat model for these multi-turn inference attacks and analyze them using probability theory, optimization frameworks, and information-theoretic leakage bounds. The attacks are shown to reliably exfiltrate sensitive information from the LLM's context (e.g., internal SharePoint documents or emails), even when standard safety measures are in place. We propose and evaluate defenses to counter such attacks, including statistical anomaly detection, fine-grained access control, prompt sanitization techniques, and architectural modifications to LLM deployment. Each defense is supported by mathematical analysis or experimental simulation. For example, we derive bounds on information leakage under differential privacy-based training and demonstrate an anomaly detection method that flags multi-turn attacks with high AUC. We also introduce an approach called "spotlighting" that uses input transformations to isolate untrusted prompt content, reducing attack success by an order of magnitude. Finally, we provide a formal proof of concept and empirical validation for a combined defense-in-depth strategy. Our work highlights that securing LLMs in enterprise settings requires moving beyond single-turn prompt filtering toward a holistic, multi-stage perspective on both attacks and defenses.
Problem

Research questions and friction points this paper is trying to address.

Study multi-stage prompt attacks on enterprise LLMs
Simulate indirect prompt injections to leak data
Propose defenses against multi-turn inference attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage probability-based threat modeling
Differential privacy leakage bounds analysis
Spotlighting input transformation defense
🔎 Similar Papers
No similar papers found.
Andrii Balashov
Andrii Balashov
Georgia Institute of Technology
Artificial Intelligence
O
Olena Ponomarova
Ukrainian State University of Science and Technologies, ESI "Prydniprovska State Academy of Civil Engineering and Architecture", Department of Computer Science, Information Technology, and Applied Mathematics, Dnipro, 49000, Dnipropetrovsk Oblast, Ukraine
Xiaohua Zhai
Xiaohua Zhai
Meta, OpenAI, Google DeepMind
Representation LearningVision and LanguageComputer Vision