Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing robustness verification methods for large language model (LLM) safety alignment rely on heuristic text perturbations, lacking systematic, model-agnostic approaches to assess vulnerability against jailbreaking. Method: We propose QueryAttack—a novel framework that treats LLMs as structured knowledge bases susceptible to adversarial queries. Instead of textual perturbations, it semantically compiles natural-language jailbreak prompts into SQL-like or programmatic structured prompts, enabling query injection without model access. Contribution/Results: QueryAttack achieves high attack success rates (ASR) across diverse models (e.g., GPT-4-1106) and vendors, demonstrating strong cross-model generalization. Comprehensive evaluation via multi-model ASR measurement and defense-aware experiments confirms its effectiveness; custom mitigation strategies reduce ASR by up to 64%. Crucially, this work redefines the jailbreaking paradigm—from “input deception” to “structured query injection”—exposing a previously underexplored security risk: LLMs’ emergent behavior as knowledge interfaces vulnerable to semantic query exploitation.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models (LLMs) have demonstrated remarkable potential in the field of natural language processing. Unfortunately, LLMs face significant security and ethical risks. Although techniques such as safety alignment are developed for defense, prior researches reveal the possibility of bypassing such defenses through well-designed jailbreak attacks. In this paper, we propose QueryAttack, a novel framework to systematically examine the generalizability of safety alignment. By treating LLMs as knowledge databases, we translate malicious queries in natural language into code-style structured query to bypass the safety alignment mechanisms of LLMs. We conduct extensive experiments on mainstream LLMs, ant the results show that QueryAttack achieves high attack success rates (ASRs) across LLMs with different developers and capabilities. We also evaluate QueryAttack's performance against common defenses, confirming that it is difficult to mitigate with general defensive techniques. To defend against QueryAttack, we tailor a defense method which can reduce ASR by up to 64% on GPT-4-1106. The code of QueryAttack can be found on https://anonymous.4open.science/r/QueryAttack-334B.
Problem

Research questions and friction points this paper is trying to address.

Bypassing LLM safety alignment
Exploiting code-style structured queries
Achieving high attack success rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Code-style structured query translation
Systematic safety alignment examination
Tailored defense method reduction
🔎 Similar Papers
No similar papers found.
Q
Qingsong Zou
Tsinghua Shenzhen International Graduate School
Jingyu Xiao
Jingyu Xiao
Tsinghua University
Data MiningLarge Language ModelsComputer NetworkMLLM4Code
Q
Qing Li
Pengcheng Laboratory
Zhi Yan
Zhi Yan
Teacher-researcher @ ENSTA - Institut Polytechnique de Paris
Mobile RoboticsChronorobotics
Y
Yuhang Wang
Southwest University
L
Li Xu
University of Electronic Science and Technology of China
W
Wenxuan Wang
The Chinese University of Hong Kong
Kuofeng Gao
Kuofeng Gao
Tsinghua University
Large Language ModelTrustworthy AIBackdoor Learning
R
Ruoyu Li
Shenzhen University
Y
Yong Jiang
Tsinghua Shenzhen International Graduate School