🤖 AI Summary
This study investigates the practical feasibility and real-world threat posed by AI-driven automated voice phishing (vishing) attacks. To this end, we design and implement ViKing—the first open-source, end-to-end deployable fully automated vishing system—integrating large language models (LLMs), real-time automatic speech recognition (ASR), text-to-speech (TTS), and telephony automation to enable dynamic dialogue generation and adaptive real-time interaction. In a controlled experiment involving 240 participants—including individuals previously trained in phishing awareness—ViKing successfully elicited sensitive information disclosure from a substantial proportion, with high conversational naturalness and deception efficacy. This work provides the first empirical validation of LLMs’ high adaptability and severe risk in authentic voice-based social engineering contexts. It demonstrates the emerging trend toward scalable, autonomous vishing enabled by generative AI, thereby offering critical empirical grounding for advancing defensive countermeasures against next-generation voice-based threats.
📝 Abstract
A vishing attack is a form of social engineering where attackers use phone calls to deceive individuals into disclosing sensitive information, such as personal data, financial information, or security credentials. Attackers exploit the perceived urgency and authenticity of voice communication to manipulate victims, often posing as legitimate entities like banks or tech support. Vishing is a particularly serious threat as it bypasses security controls designed to protect information. In this work, we study the potential for vishing attacks to escalate with the advent of AI. In theory, AI-powered software bots may have the ability to automate these attacks by initiating conversations with potential victims via phone calls and deceiving them into disclosing sensitive information. To validate this thesis, we introduce ViKing, an AI-powered vishing system developed using publicly available AI technology. It relies on a Large Language Model (LLM) as its core cognitive processor to steer conversations with victims, complemented by a pipeline of speech-to-text and text-to-speech modules that facilitate audio-text conversion in phone calls. Through a controlled social experiment involving 240 participants, we discovered that ViKing has successfully persuaded many participants to reveal sensitive information, even those who had been explicitly warned about the risk of vishing campaigns. Interactions with ViKing's bots were generally considered realistic. From these findings, we conclude that tools like ViKing may already be accessible to potential malicious actors, while also serving as an invaluable resource for cyber awareness programs.