VelLMes: A high-interaction AI-based deception framework

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based honeypots are largely confined to single protocols (e.g., SSH) and lack large-scale empirical evaluation against real human attackers. This paper proposes the first high-interaction, multi-protocol LLM honeypot framework supporting SSH, MySQL, POP3, and HTTP. It achieves semantically coherent and behaviorally realistic dynamic responses through fine-grained prompt engineering and protocol-state modeling. Crucially, we conduct the first human-subject evaluation with 89 adversarial participants, revealing that 30% misclassified our system as legitimate. Unit testing demonstrates 100% pass rates for critical protocol-specific response behaviors. Furthermore, a publicly deployed SSH instance robustly withstands unstructured adversarial probing. Our work provides the first empirical validation of LLMs’ practical feasibility and deceptive efficacy in complex network deception—establishing a novel paradigm for AI-driven proactive defense.

Technology Category

Application Category

📝 Abstract
There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.
Problem

Research questions and friction points this paper is trying to address.

Existing deception systems lack extensive human attacker evaluation
Current LLM-based deception tools simulate only single service types
Limited protocol simulation capabilities in existing cyber-deception frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates multiple protocols and services
Designed for human attackers with interactivity
Uses LLMs for realistic honeypot responses
🔎 Similar Papers
No similar papers found.