LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

Prior work has not systematically characterized the vulnerability of large language models (LLMs) to jailbreak attacks in code generation—particularly their inadequate safety refusal capability when prompted to generate malicious code. Method: We introduce MalwareBench, the first jailbreak benchmark dedicated to code security, comprising 3,520 malicious intent prompts, 11 jailbreak strategies, and 29 functional scenarios. We propose a robustness evaluation framework and a quantitative refusal-rate metric for fine-grained, multi-dimensional assessment of LLM code-generation safety. Results: Experiments reveal that mainstream LLMs achieve only 60.93% average refusal rate on original malicious prompts, plummeting to 39.92% under jailbreak attacks. This demonstrates that compositional jailbreaks significantly degrade model defenses, exposing critical deficiencies in current code-generation safety mechanisms.

Technology Category

Application Category

📝 Abstract

The widespread adoption of Large Language Models (LLMs) has heightened concerns about their security, particularly their vulnerability to jailbreak attacks that leverage crafted prompts to generate malicious outputs. While prior research has been conducted on general security capabilities of LLMs, their specific susceptibility to jailbreak attacks in code generation remains largely unexplored. To fill this gap, we propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation, designed to evaluate LLM robustness against such threats. MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories. Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak methods further reduces the model's security capabilities: specifically, the average rejection rate for malicious content is 60.93%, dropping to 39.92% when combined with jailbreak attack algorithms. Our work highlights that the code security capabilities of LLMs still pose significant challenges.

Problem

Research questions and friction points this paper is trying to address.

LLMs vulnerability to jailbreak attacks in code generation

Lack of benchmark for evaluating LLM robustness against malware requests

Combined jailbreak methods significantly reduce LLM security capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes MalwareBench for jailbreak prompt evaluation

Covers 11 jailbreak methods and 29 categories

Tests LLM robustness against combined attacks

🔎 Similar Papers

No similar papers found.