Prompted Contextual Vectors for Spear-Phishing Detection

📅 2024-02-13

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Detecting highly realistic, targeted phishing emails—especially those generated by large language models (LLMs) and impersonating trusted individuals (e.g., friends or instructors)—remains a significant challenge. To address this, we propose a prompt-driven, multi-LLM collaborative document vectorization method: leveraging persuasion psychology principles, we design structured prompts to guide LLMs (e.g., LLaMA, GPT) in extracting context-aware linguistic features and generating discriminative vector representations; these embeddings are then classified via supervised models (e.g., XGBoost, SVM). Our key contributions are: (1) the first prompt-driven paradigm for context-sensitive document vectorization; (2) the release of the first high-quality, publicly available dataset of targeted phishing emails; and (3) achieving 91% F1-score on adversarial test sets—despite training exclusively on conventional phishing/legitimate email data, with no LLM-generated samples required—demonstrating strong generalization and transferability to other adversarial text classification tasks.

Technology Category

Application Category

📝 Abstract

Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance. To address this, we propose a detection approach based on a novel document vectorization method that utilizes an ensemble of LLMs to create representation vectors. By prompting LLMs to reason and respond to human-crafted questions, we quantify the presence of common persuasion principles in the email's content, producing prompted contextual document vectors for a downstream supervised machine learning model. We evaluate our method using a unique dataset generated by a proprietary system that automates target reconnaissance and spear-phishing email creation. Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails, with the training set comprising only traditional phishing and benign emails. Key contributions include a novel document vectorization method utilizing LLM reasoning, a publicly available dataset of high-quality spear-phishing emails, and the demonstrated effectiveness of our method in detecting such emails. This methodology can be utilized for various document classification tasks, particularly in adversarial problem domains.

Problem

Research questions and friction points this paper is trying to address.

Phishing Detection

Large Language Models

Cybersecurity Defense

Innovation

Methods, ideas, or system contributions that make the work stand out.

Phishing Detection

Large Language Models

Persuasive Techniques Identification

🔎 Similar Papers

No similar papers found.