Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization

📅 2024-05-23
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
Existing universal jailbreaking methods rely on computationally intensive optimization, resulting in low efficiency, long runtime, and neglect of prompt design’s critical role. Method: We propose POUGH, a lightweight and efficient semantic-guided prompt organization framework that jointly designs semantic-aware prompt sampling and ranking with iterative gradient optimization—enabling generation of universal suffixes compatible with arbitrary user inputs. Contribution/Results: POUGH breaks away from single-paradigm optimization, significantly improving attack efficiency and cross-model generalizability. Evaluated on four mainstream large language models and ten malicious objectives, POUGH achieves high attack success rates, reduces average runtime by over 60%, and demonstrates strong transferability and practical utility.

Technology Category

Application Category

📝 Abstract
Universal goal hijacking is a kind of prompt injection attack that forces LLMs to return a target malicious response for arbitrary normal user prompts. The previous methods achieve high attack performance while being too cumbersome and time-consuming. Also, they have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To this end, we propose a method called POUGH that incorporates an efficient optimization algorithm and two semantics-guided prompt organization strategies. Specifically, our method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes them. Given the sequentially ranked prompts, our method employs an iterative optimization algorithm to generate a fixed suffix that can concatenate to arbitrary user prompts for universal goal hijacking. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Addresses universal goal hijacking in LLMs via prompt injection
Improves efficiency over cumbersome existing attack methods
Introduces semantics-guided prompt organization for optimized hijacking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantics-guided prompt organization strategies
Efficient iterative optimization algorithm
Representative prompt sampling and ranking
🔎 Similar Papers
No similar papers found.
Y
Yihao Huang
Nanyang Technological University, Singapore
C
Chong Wang
Nanyang Technological University, Singapore
Xiaojun Jia
Xiaojun Jia
Nanyang Technological University
Explainable AIRobust AIEfficient AI
Q
Qing Guo
CFAR and IHPC, Agency for Science, Technology and Research (A*STAR), Singapore
Felix Juefei-Xu
Felix Juefei-Xu
Research Scientist, Meta Superintelligence Labs
Generative ModelsDeep LearningComputer VisionAI SafetyAdversarial Robustness
J
Jian Zhang
Nanyang Technological University, Singapore
G
G. Pu
East China Normal University, China
Y
Yang Liu
Nanyang Technological University, Singapore