Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method

📅 2024-06-18
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This study empirically evaluates the deceptive efficacy of large language models (LLMs)—notably GPT-4—versus humans in generating targeted smishing (SMS-based phishing) messages. Method: We introduce TRAPD (Threshold Ranking Approach for Personalized Deception), a novel paradigm integrating personalized persuasiveness ranking with double-blind source attribution experiments, conducted in authentic professional contexts. Contribution/Results: AI-generated smishing messages significantly outperform human-authored ones in persuasiveness—especially in job-seeking scenarios—while target users’ accuracy in identifying message origin (AI vs. human) is only ~52%, near chance level. This work provides the first systematic evidence that LLMs can produce highly credible, source-obscured, socially engineered content, thereby exposing a critical real-world security risk: the weaponization of foundation models for scalable, evasive social engineering attacks.

Technology Category

Application Category

📝 Abstract
This paper explores the use of Large Language Models (LLMs) in spear phishing message generation and evaluates their performance compared to human-authored counterparts. Our pilot study examines the effectiveness of smishing (SMS phishing) messages created by GPT-4 and human authors, which have been personalized for willing targets. The targets assessed these messages in a modified ranked-order experiment using a novel methodology we call TRAPD (Threshold Ranking Approach for Personalized Deception). Experiments involved ranking each spear phishing message from most to least convincing, providing qualitative feedback, and guessing which messages were human- or AI-generated. Results show that LLM-generated messages are often perceived as more convincing than those authored by humans, particularly job-related messages. Targets also struggled to distinguish between human- and AI-generated messages. We analyze different criteria the targets used to assess the persuasiveness and source of messages. This study aims to highlight the urgent need for further research and improved countermeasures against personalized AI-enabled social engineering attacks.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLM-generated spear phishing SMS effectiveness.
Compares AI vs human-authored phishing message persuasiveness.
Highlights need for countermeasures against AI social engineering.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for spear phishing message generation
TRAPD methodology for personalized deception
Comparative analysis of AI vs human-authored messages
🔎 Similar Papers
J
Jerson Francia
Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602
D
Derek Hansen
Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602
B
Benjamin L. Schooley
Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602
M
Matthew Taylor
Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602
S
Shydra Murray
Department of Electrical and Computer Engineering, Brigham Young University, Provo, Utah 84602
G
Greg Snow
Department of Statistics, Brigham Young University, Provo, Utah 84602