SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses limitations in existing transfer-based textual adversarial attacks, which suffer from suboptimal proxy model weighting and inadequate word importance estimation, thereby constraining attack transferability. To overcome these issues, the authors propose SEP-Attack, a novel approach that leverages Determinantal Point Processes (DPPs) to generate diverse ensemble weights for proxy models. Building upon this, they introduce a word importance scoring mechanism based on reweighted prediction confidence and incorporate a transferability quantification metric to select the most effective adversarial examples. Experimental results demonstrate that SEP-Attack significantly outperforms state-of-the-art methods across four benchmark datasets and two real-world APIs, achieving higher attack success rates and improved generalization in black-box settings.

📝 Abstract

Despite the strong performance of deep neural networks in modern Web and language applications, they remain vulnerable to adversarial attacks, especially transferable attacks that generate adversarial examples using surrogate models without accessing the victim model. Transferable attacks in the text domain are still under-explored, with only a few studies addressing this challenging issue, often with suboptimal results due to equal treatment of submodels or inaccurate estimation of importance scores. To address these challenges, we propose a simple yet effective paradigm for transfer-based textual adversarial attack, named SEP-Attack. Specifically, we employ the Determinantal Point Process (DPP) to generate diverse surrogate ensemble weights, representing the transferability of submodels. Using these weights, we introduce a new metric to evaluate prediction confidence scores, which in turn are used to calculate word importance scores and generate adversarial candidates. Finally, we quantify the transferability score for each candidate and select the top ones as the final transferable adversarial examples. Experiments conducted on four datasets and two real-world APIs validate the efficacy of SEP-Attack, significantly outperforming state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

transfer-based adversarial attack

textual adversarial example

surrogate model

word importance scoring

transferability

Innovation

Methods, ideas, or system contributions that make the work stand out.

transfer-based adversarial attack

Determinantal Point Process

surrogate ensemble