Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization

📅 2024-05-23

📈 Citations: 9

✨ Influential: 1

career value

210K/year

🤖 AI Summary

Existing universal jailbreaking methods rely on computationally intensive optimization, resulting in low efficiency, long runtime, and neglect of prompt design’s critical role. Method: We propose POUGH, a lightweight and efficient semantic-guided prompt organization framework that jointly designs semantic-aware prompt sampling and ranking with iterative gradient optimization—enabling generation of universal suffixes compatible with arbitrary user inputs. Contribution/Results: POUGH breaks away from single-paradigm optimization, significantly improving attack efficiency and cross-model generalizability. Evaluated on four mainstream large language models and ten malicious objectives, POUGH achieves high attack success rates, reduces average runtime by over 60%, and demonstrates strong transferability and practical utility.

Technology Category

Application Category

📝 Abstract

Universal goal hijacking is a kind of prompt injection attack that forces LLMs to return a target malicious response for arbitrary normal user prompts. The previous methods achieve high attack performance while being too cumbersome and time-consuming. Also, they have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To this end, we propose a method called POUGH that incorporates an efficient optimization algorithm and two semantics-guided prompt organization strategies. Specifically, our method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes them. Given the sequentially ranked prompts, our method employs an iterative optimization algorithm to generate a fixed suffix that can concatenate to arbitrary user prompts for universal goal hijacking. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Addresses universal goal hijacking in LLMs via prompt injection

Improves efficiency over cumbersome existing attack methods

Introduces semantics-guided prompt organization for optimized hijacking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantics-guided prompt organization strategies

Efficient iterative optimization algorithm

Representative prompt sampling and ranking

🔎 Similar Papers

No similar papers found.