Prompted Contextual Vectors for Spear-Phishing Detection

📅 2024-02-13
🏛️ arXiv.org
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
Detecting highly realistic, targeted phishing emails—especially those generated by large language models (LLMs) and impersonating trusted individuals (e.g., friends or instructors)—remains a significant challenge. To address this, we propose a prompt-driven, multi-LLM collaborative document vectorization method: leveraging persuasion psychology principles, we design structured prompts to guide LLMs (e.g., LLaMA, GPT) in extracting context-aware linguistic features and generating discriminative vector representations; these embeddings are then classified via supervised models (e.g., XGBoost, SVM). Our key contributions are: (1) the first prompt-driven paradigm for context-sensitive document vectorization; (2) the release of the first high-quality, publicly available dataset of targeted phishing emails; and (3) achieving 91% F1-score on adversarial test sets—despite training exclusively on conventional phishing/legitimate email data, with no LLM-generated samples required—demonstrating strong generalization and transferability to other adversarial text classification tasks.

Technology Category

Application Category

📝 Abstract
Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance. To address this, we propose a detection approach based on a novel document vectorization method that utilizes an ensemble of LLMs to create representation vectors. By prompting LLMs to reason and respond to human-crafted questions, we quantify the presence of common persuasion principles in the email's content, producing prompted contextual document vectors for a downstream supervised machine learning model. We evaluate our method using a unique dataset generated by a proprietary system that automates target reconnaissance and spear-phishing email creation. Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails, with the training set comprising only traditional phishing and benign emails. Key contributions include a novel document vectorization method utilizing LLM reasoning, a publicly available dataset of high-quality spear-phishing emails, and the demonstrated effectiveness of our method in detecting such emails. This methodology can be utilized for various document classification tasks, particularly in adversarial problem domains.
Problem

Research questions and friction points this paper is trying to address.

Phishing Detection
Large Language Models
Cybersecurity Defense
Innovation

Methods, ideas, or system contributions that make the work stand out.

Phishing Detection
Large Language Models
Persuasive Techniques Identification
🔎 Similar Papers
No similar papers found.
D
Daniel Nahmias
Ben-Gurion University of the Negev, Accenture Cyber Research Lab
Gal Engelberg
Gal Engelberg
Accenture Labs | University of Haifa
AI for SecuritySecurity of AIProcess MiningKnowledge GraphsGenerative AI
D
Dan Klein
Accenture Cyber Research Lab
A
A. Shabtai
Ben-Gurion University of the Negev