Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the weak safety alignment of large language models (LLMs) in malicious code generation scenarios by proposing SPELL—the first evaluation framework explicitly targeting malicious code generation as a jailbreaking objective. Methodologically, it introduces a novel temporal-sentence-pairing strategy, integrates prior-knowledge-base-driven intelligent prompt composition, employs cross-model, cross-platform assessment (GPT-4.1, Claude-3.5, Qwen2.5-Coder), and validates outputs in a real-world development environment (Cursor). Experiments span eight malicious code categories, achieving an average attack success rate of 57.1% (up to 83.75%). Over 73% of generated code is classified as malicious by state-of-the-art detection tools—constituting the first systematic demonstration of severe security vulnerabilities in current code-specialized LLMs.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have revolutionized software development through AI-assisted coding tools, enabling developers with limited programming expertise to create sophisticated applications. However, this accessibility extends to malicious actors who may exploit these powerful tools to generate harmful software. Existing jailbreaking research primarily focuses on general attack scenarios against LLMs, with limited exploration of malicious code generation as a jailbreak target. To address this gap, we propose SPELL, a comprehensive testing framework specifically designed to evaluate the weakness of security alignment in malicious code generation. Our framework employs a time-division selection strategy that systematically constructs jailbreaking prompts by intelligently combining sentences from a prior knowledge dataset, balancing exploration of novel attack patterns with exploitation of successful techniques. Extensive evaluation across three advanced code models (GPT-4.1, Claude-3.5, and Qwen2.5-Coder) demonstrates SPELL's effectiveness, achieving attack success rates of 83.75%, 19.38%, and 68.12% respectively across eight malicious code categories. The generated prompts successfully produce malicious code in real-world AI development tools such as Cursor, with outputs confirmed as malicious by state-of-the-art detection systems at rates exceeding 73%. These findings reveal significant security gaps in current LLM implementations and provide valuable insights for improving AI safety alignment in code generation applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLM security weaknesses in malicious code generation
Tests jailbreaking prompts for AI-assisted coding tools
Reveals security gaps in current code generation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework tests security alignment via jailbreaking prompts
Uses time-division selection to combine sentences intelligently
Evaluates models across multiple malicious code categories
🔎 Similar Papers
Y
Yifan Huang
Nanyang Technological University, Singapore
Xiaojun Jia
Xiaojun Jia
Nanyang Technological University
Explainable AIRobust AIEfficient AI
Wenbo Guo
Wenbo Guo
UC Santa Barbara
Machine LearningSecurity
Yuqiang Sun
Yuqiang Sun
Research Fellow at Nanyang Technological University
Software SecurityLarge Language ModelSoftware Engineering
Y
Yihao Huang
National University of Singapore, Singapore
C
Chong Wang
Nanyang Technological University, Singapore
Y
Yang Liu
Nanyang Technological University, Singapore