Jailbreaking with Universal Multi-Prompts

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses critical challenges in defending large language models (LLMs) against universal jailbreak attacks—namely, poor generalizability, high computational overhead, and limited defense efficacy. Methodologically, it departs from conventional single-sample prompt optimization by introducing JUMP, the first cross-task transferable multi-prompt joint optimization framework, coupled with its defensive counterpart DUMP. JUMP integrates gradient-guided collaborative multi-prompt optimization, task-agnostic prompt embedding learning, adversarial prompt distillation, and defense alignment. Experiments across multiple mainstream LLMs demonstrate that JUMP achieves a 23.6% higher attack success rate than state-of-the-art methods; moreover, it attains over 89% zero-shot task transfer efficiency. Concurrently, DUMP delivers efficient and robust defense. This work establishes the first unified, co-evolutionary paradigm for prompt optimization—jointly advancing both attack and defense—thereby redefining the landscape of universal jailbreak mitigation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.

Problem

Research questions and friction points this paper is trying to address.

Large Language Model Control

Computational Cost Reduction

Attack Prevention

Innovation

Methods, ideas, or system contributions that make the work stand out.

JUMP

multi-prompt approach

DUMP

🔎 Similar Papers

When"Competency"in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers

2024-02-16Citations: 16

Authors to Follow