On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically investigates, for the first time, the feasibility of integrating multimodal large language models (MLLMs) with augmented reality (AR) to enable sophisticated social engineering attacks. We propose SEAR, a three-stage closed-loop framework comprising AR context synthesis, role-aware multimodal retrieval-augmented generation (RAG), and an adaptive ReInteract agent, enabling high-trust, multi-stage dynamic attacks. Our key contributions include: (1) establishing a novel AR–LLM co-driven attack paradigm; (2) designing a joint environmental–acoustic–visual modeling mechanism; and (3) introducing an interactive human–agent reasoning loop. In an IRB-approved study with 60 participants, 93.3% fell victim to phishing emails and 85% consented to answer malicious phone calls. To support reproducible evaluation, we publicly release the first annotated AR social dialogue dataset—comprising 180 interaction rounds—demonstrating attack efficacy while identifying critical realism bottlenecks.

Technology Category

Application Category

📝 Abstract
Augmented Reality (AR) and Multimodal Large Language Models (LLMs) are rapidly evolving, providing unprecedented capabilities for human-computer interaction. However, their integration introduces a new attack surface for social engineering. In this paper, we systematically investigate the feasibility of orchestrating AR-driven Social Engineering attacks using Multimodal LLM for the first time, via our proposed SEAR framework, which operates through three key phases: (1) AR-based social context synthesis, which fuses Multimodal inputs (visual, auditory and environmental cues); (2) role-based Multimodal RAG (Retrieval-Augmented Generation), which dynamically retrieves and integrates contextual data while preserving character differentiation; and (3) ReInteract social engineering agents, which execute adaptive multiphase attack strategies through inference interaction loops. To verify SEAR, we conducted an IRB-approved study with 60 participants in three experimental configurations (unassisted, AR+LLM, and full SEAR pipeline) compiling a new dataset of 180 annotated conversations in simulated social scenarios. Our results show that SEAR is highly effective at eliciting high-risk behaviors (e.g., 93.3% of participants susceptible to email phishing). The framework was particularly effective in building trust, with 85% of targets willing to accept an attacker's call after an interaction. Also, we identified notable limitations such as ``occasionally artificial'' due to perceived authenticity gaps. This work provides proof-of-concept for AR-LLM driven social engineering attacks and insights for developing defensive countermeasures against next-generation augmented reality threats.
Problem

Research questions and friction points this paper is trying to address.

Investigates AR-driven social engineering attacks using Multimodal LLMs
Proposes SEAR framework for adaptive multiphase attack strategies
Evaluates effectiveness and limitations of AR-LLM social engineering
Innovation

Methods, ideas, or system contributions that make the work stand out.

AR-based social context synthesis with multimodal inputs
Role-based Multimodal RAG for contextual data integration
ReInteract agents for adaptive multiphase attack strategies
🔎 Similar Papers
No similar papers found.
T
Ting Bi
Huazhong University of Science and Technology
C
Chenghang Ye
Hubei University
Zheyu Yang
Zheyu Yang
Hubei University
Z
Ziyi Zhou
Hubei University
C
Cui Tang
Hubei University
J
Jun Zhang
Huazhong University of Science and Technology
Z
Zui Tao
Huazhong University of Science and Technology
K
Kailong Wang
Huazhong University of Science and Technology
Liting Zhou
Liting Zhou
Assistant Professor in Dublin City University
Educational TechnologyPeers LearningPsycologyArtificial IntelligenceLifelogging
Y
Yang Yang
Hubei University
Tianlong Yu
Tianlong Yu
CMU