From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

📅 2025-11-05
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of evaluating adversarial robustness of large language models (LLMs) in safety-critical tasks, this paper proposes StaDec and DyDec—two zero-shot, training-free adversarial attack frameworks. Methodologically, they innovatively leverage the LLM’s intrinsic semantic understanding capability, employing prompt-engineered, fully automated text rewriting to dynamically generate semantically preserved and naturally inconspicuous adversarial examples—eliminating reliance on external heuristic rules. The contributions are threefold: (1) achieving strong cross-model transferability and adaptive evolution; (2) attaining high attack success rates against unknown target models while maintaining semantic similarity (e.g., BLEU > 0.85, BERTScore > 0.92); and (3) releasing open-source code and benchmark datasets to enable reproducible robustness evaluation. Experimental results demonstrate consistent efficacy across diverse LLMs—including LLaMA-2, Qwen, and GLM—under both white-box and black-box settings, advancing standardized, scalable assessment of LLM robustness.

Technology Category

Application Category

📝 Abstract
LLMs can provide substantial zero-shot performance on diverse tasks using a simple task prompt, eliminating the need for training or fine-tuning. However, when applying these models to sensitive tasks, it is crucial to thoroughly assess their robustness against adversarial inputs. In this work, we introduce Static Deceptor (StaDec) and Dynamic Deceptor (DyDec), two innovative attack frameworks designed to systematically generate dynamic and adaptive adversarial examples by leveraging the understanding of the LLMs. We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text while effectively deceiving the target LLM. By utilizing an automated, LLM-driven pipeline, we eliminate the dependence on external heuristics. Our attacks evolve with the advancements in LLMs and demonstrate strong transferability across models unknown to the attacker. Overall, this work provides a systematic approach for the self-assessment of an LLM's robustness. We release our code and data at https://github.com/Shukti042/AdversarialExample.
Problem

Research questions and friction points this paper is trying to address.

Systematically generates adaptive adversarial examples to test LLM robustness
Creates semantically preserved but deceptive inputs without external heuristics
Provides automated framework for self-assessment across evolving LLM architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated LLM-driven pipeline eliminates external heuristics
Generates subtle adversarial examples preserving semantic similarity
Dynamic frameworks adapt and transfer across unknown models
🔎 Similar Papers
No similar papers found.
N
Najrin Sultana
The Pennsylvania State University
M
Md. Rafi Ur Rashid
The Pennsylvania State University
K
Kang Gu
Dartmouth College
Shagufta Mehnaz
Shagufta Mehnaz
The Pennsylvania State University
Information Security & Privacy