MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work exposes a critical vulnerability in large language models’ (LLMs) alignment mechanisms under compositional attacks: adversaries can modularly decompose malicious intent into multiple semantically benign subtasks, thereby evading single-prompt safety filters. To address this, we propose the first compositional adversarial framework targeting malware generation. Our method introduces: (1) a Malicious Description Intermediate Representation (MDIR) that structurally maps high-level malicious intent to safe, executable subtasks; and (2) a compiler-inspired, multi-stage synthesis architecture integrating modular decomposition with alignment-evasion generation techniques. Experiments demonstrate that our approach outperforms state-of-the-art jailbreaking methods by 365.79% across three benchmarks, surpasses underground commercial services by 78.07%, and successfully reproduces and enhances 16 real-world malware samples. This constitutes the first systematic empirical validation of LLM compositional alignment failure.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications. This accessibility extends to malicious software development, raising significant security concerns. While LLM providers have implemented alignment mechanisms to prevent direct generation of overtly malicious code, these safeguards predominantly evaluate individual prompts in isolation, overlooking a critical vulnerability: malicious operations can be systematically decomposed into benign-appearing sub-tasks. In this paper, we introduce the Malware Generation Compiler (MGC), a novel framework that leverages this vulnerability through modular decomposition and alignment-evasive generation. MGC employs a specialized Malware Description Intermediate Representation (MDIR) to bridge high-level malicious intents and benign-appearing code snippets. Extensive evaluation demonstrates that our attack reliably generates functional malware across diverse task specifications and categories, outperforming jailbreaking methods by +365.79% and underground services by +78.07% in correctness on three benchmark datasets. Case studies further show that MGC can reproduce and even enhance 16 real-world malware samples. This work provides critical insights for security researchers by exposing the risks of compositional attacks against aligned AI systems. Demonstrations are available at https://sites.google.com/view/malware-generation-compiler.

Problem

Research questions and friction points this paper is trying to address.

Exploits LLM alignment gaps via decomposed benign sub-tasks

Generates malware through evasive modular code composition

Exposes risks of compositional attacks on aligned AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular decomposition evades alignment checks

MDIR bridges malicious intents to benign code

Outperforms jailbreaking by +365.79% correctness

🔎 Similar Papers

No similar papers found.

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Natural Language Processing Researcher

Kitware

Remote, USA: AL, AZ, CO, DC, FL, GA, IL, IN, MA, MD, ME, MN, NC, NM, NY, OH, OR, PA, TN, TX, UT, VA, WI

Authors to Follow