LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work has not systematically characterized the vulnerability of large language models (LLMs) to jailbreak attacks in code generation—particularly their inadequate safety refusal capability when prompted to generate malicious code. Method: We introduce MalwareBench, the first jailbreak benchmark dedicated to code security, comprising 3,520 malicious intent prompts, 11 jailbreak strategies, and 29 functional scenarios. We propose a robustness evaluation framework and a quantitative refusal-rate metric for fine-grained, multi-dimensional assessment of LLM code-generation safety. Results: Experiments reveal that mainstream LLMs achieve only 60.93% average refusal rate on original malicious prompts, plummeting to 39.92% under jailbreak attacks. This demonstrates that compositional jailbreaks significantly degrade model defenses, exposing critical deficiencies in current code-generation safety mechanisms.

Technology Category

Application Category

📝 Abstract
The widespread adoption of Large Language Models (LLMs) has heightened concerns about their security, particularly their vulnerability to jailbreak attacks that leverage crafted prompts to generate malicious outputs. While prior research has been conducted on general security capabilities of LLMs, their specific susceptibility to jailbreak attacks in code generation remains largely unexplored. To fill this gap, we propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation, designed to evaluate LLM robustness against such threats. MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories. Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak methods further reduces the model's security capabilities: specifically, the average rejection rate for malicious content is 60.93%, dropping to 39.92% when combined with jailbreak attack algorithms. Our work highlights that the code security capabilities of LLMs still pose significant challenges.
Problem

Research questions and friction points this paper is trying to address.

LLMs vulnerability to jailbreak attacks in code generation
Lack of benchmark for evaluating LLM robustness against malware requests
Combined jailbreak methods significantly reduce LLM security capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes MalwareBench for jailbreak prompt evaluation
Covers 11 jailbreak methods and 29 categories
Tests LLM robustness against combined attacks
🔎 Similar Papers
No similar papers found.
H
Haoyang Li
Institute of Artificial Intelligence (TeleAI), China Telecom; Beihang University
Huan Gao
Huan Gao
微软中国
自然语言处理
Z
Zhiyuan Zhao
Institute of Artificial Intelligence (TeleAI), China Telecom
Zhiyu Lin
Zhiyu Lin
Beijing Jiaotong University
J
Junyu Gao
Institute of Artificial Intelligence (TeleAI), China Telecom; Northwestern Polytechnical University
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom