🤖 AI Summary
This work addresses the problem of efficient adversarial prompt generation against black-box large language models (LLMs). We propose an amortized optimization framework based on Diffusion LLMs, which reframes prompt search as a non-autoregressive conditional generation task—marking the first use of diffusion-based LLMs to explicitly model the joint prompt-response distribution. This enables few-shot, high-efficiency adversarial prompt synthesis. Our method integrates probability-guided parallel sampling with perplexity constraints, yielding prompts that exhibit low perplexity, high diversity, and strong cross-model transferability. Experiments demonstrate significant improvements in jailbreaking success rates across diverse robust fine-tuning variants and proprietary black-box LLMs. The approach establishes a scalable, generalizable paradigm for red-teaming and automated prompt-based adversarial testing.
📝 Abstract
We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pairs, can serve as powerful surrogates for prompt search. This approach enables the direct conditional generation of prompts, effectively replacing costly, per-instance discrete optimization with a small number of parallelizable samples. We provide a probabilistic analysis demonstrating that under mild fidelity assumptions, only a few conditional samples are required to recover high-reward (harmful) prompts. Empirically, we find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models, including robustly trained and proprietary LLMs. Beyond adversarial prompting, our framework opens new directions for red teaming, automated prompt optimization, and leveraging emerging Flow- and Diffusion-based LLMs.