🤖 AI Summary
This study investigates whether large language models (LLMs) can autonomously generate functionally equivalent malware variants that evade antivirus detection while preserving malicious functionality. To address this, we propose LLMalMorph—a semi-automated framework that requires no model fine-tuning. It leverages function-level code information extraction, structured prompt engineering, and semantics-aware code transformation strategies to produce behaviorally identical variants. Our key contribution lies in tightly integrating LLMs’ semantic understanding with lightweight, semantics-preserving code transformations—bypassing the limitations of syntax-only perturbations or black-box optimization approaches. Evaluated on 10 Windows-based malware samples, LLMalMorph generated 618 variants that reduced average detection rates across major commercial antivirus engines by 72.4%. Moreover, the variants significantly evaded machine learning–based detectors not specifically trained on LLM-generated samples, demonstrating a novel, LLM-driven threat paradigm for adaptive malware evolution.
📝 Abstract
Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our thorough experiments demonstrate that it is possible to reduce the detection rates of antivirus engines of these malware variants to some extent while preserving malware functionalities. In addition, despite not optimizing against any Machine Learning (ML)-based malware detectors, several variants also achieved notable attack success rates against an ML-based malware classifier. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.