Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Large language models (LLMs) face severe threats from prompt injection attacks, yet existing defenses struggle to simultaneously achieve generalizability and interpretability. Method: We propose DMPI-PMHFE, a dual-channel fusion detection framework that jointly encodes semantic features (via DeBERTa-v3-base) and structural features (derived from attack-specific syntactic patterns, punctuation anomalies, and control-word heuristics) at the feature level—ensuring both robustness and transparency. Contribution/Results: On multiple benchmark datasets, DMPI-PMHFE achieves an F1-score of 98.7%, significantly outperforming state-of-the-art methods. In real-world deployment across mainstream LLMs—including GLM-4, LLaMA-3, Qwen2.5, and GPT-4o—it reduces average attack success rates by 92.4%. The framework demonstrates cross-model robustness, low computational overhead, and practical suitability for real-time, production-grade defense.

Technology Category

Application Category

📝 Abstract

With the widespread adoption of Large Language Models (LLMs), prompt injection attacks have emerged as a significant security threat. Existing defense mechanisms often face critical trade-offs between effectiveness and generalizability. This highlights the urgent need for efficient prompt injection detection methods that are applicable across a wide range of LLMs. To address this challenge, we propose DMPI-PMHFE, a dual-channel feature fusion detection framework. It integrates a pretrained language model with heuristic feature engineering to detect prompt injection attacks. Specifically, the framework employs DeBERTa-v3-base as a feature extractor to transform input text into semantic vectors enriched with contextual information. In parallel, we design heuristic rules based on known attack patterns to extract explicit structural features commonly observed in attacks. Features from both channels are subsequently fused and passed through a fully connected neural network to produce the final prediction. This dual-channel approach mitigates the limitations of relying only on DeBERTa to extract features. Experimental results on diverse benchmark datasets demonstrate that DMPI-PMHFE outperforms existing methods in terms of accuracy, recall, and F1-score. Furthermore, when deployed actually, it significantly reduces attack success rates across mainstream LLMs, including GLM-4, LLaMA 3, Qwen 2.5, and GPT-4o.

Problem

Research questions and friction points this paper is trying to address.

Detects prompt injection attacks in large language models

Combines pretrained models and heuristic features for detection

Improves accuracy and reduces attack success rates across LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates pretrained model and heuristic features

Uses DeBERTa-v3 for semantic vector extraction

Fuses dual-channel features for final prediction

🔎 Similar Papers

No similar papers found.