PromptLocate: Localizing Prompt Injection Attacks

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the challenge of localizing injection points in prompt injection attacks. We propose the first forensic, fine-grained localization method designed specifically for post-attack analysis. Our approach leverages semantic segmentation, integrating static analysis with heuristic rules to systematically distinguish instruction-contaminated segments from data-contaminated segments, thereby enabling precise separation of benign input from maliciously injected content. To our knowledge, this is the first study to formally formulate prompt injection localization as a contamination-segment identification task—filling a critical gap in attack provenance tracing and data recovery research. Evaluated across 16 diverse attack scenarios—including eight known and eight adaptive variants—our method achieves consistently high localization accuracy, demonstrating strong robustness and generalization capability.

Technology Category

Application Category

📝 Abstract

Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists of injected instruction(s) and data. Localizing the injected prompt within contaminated data is crucial for post-attack forensic analysis and data recovery. Despite its growing importance, prompt injection localization remains largely unexplored. In this work, we bridge this gap by proposing PromptLocate, the first method for localizing injected prompts. PromptLocate comprises three steps: (1) splitting the contaminated data into semantically coherent segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks.

Problem

Research questions and friction points this paper is trying to address.

Localizing injected prompts in contaminated data for forensic analysis

Identifying malicious instruction segments within compromised language model inputs

Pinpointing injected data segments to enable post-attack data recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segments contaminated data into coherent parts

Identifies segments with injected instructions

Pinpoints segments containing injected data

🔎 Similar Papers

No similar papers found.