Persona-Conditioned Adversarial Prompting (PCAP): Multi-Identity Red-Teaming for Enhanced Adversarial Prompt Discovery

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

226K/year
πŸ€– AI Summary
Existing automated red-teaming approaches struggle to uncover adversarial prompts that rely on attacker identity, rhetorical framing, or multi-turn strategies, leading to an underestimation of real-world risks. This work proposes a role- and strategy-card-conditioned method for generating adversarial prompts, explicitly modeling attacker identity and strategic intent within the generation process for the first time. The approach introduces a parallel role-conditioned beam search mechanism to systematically explore diverse and transferable jailbreaking prompts. Orthogonal to the underlying search algorithm, the method significantly improves attack success ratesβ€”from 58% to 97%β€”on the GPT-OSS~120B model, substantially enhancing the breadth, diversity, and effectiveness of adversarial attacks.
πŸ“ Abstract
Existing automated red-teaming pipelines often miss attacks that depend on attacker identity, framing, or multi-turn tactics. This under-coverage underestimates real-world risk. We introduce Persona-Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on attacker personas and strategy cards and runs parallel persona-conditioned beam searches to discover diverse, transferable jailbreaks. PCAP is orthogonal to the underlying search algorithm and substantially increases attack success rate (ASR) and prompt diversity (e.g., ASR on GPT-OSS~120B from $\approx58\% \rightarrow \approx97\%$), improving attack strategy coverage and diversity.
Problem

Research questions and friction points this paper is trying to address.

red-teaming
adversarial prompting
attacker persona
jailbreak
multi-turn tactics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona-Conditioned Adversarial Prompting
red-teaming
jailbreak attacks
beam search
attack diversity
πŸ”Ž Similar Papers
No similar papers found.
πŸ’Ό Related Jobs