Differential Robustness in Transformer Language Models: Empirical Evaluation Under Adversarial Text Attacks

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Assessing the robustness of pre-trained language models against adversarial attacks remains critical for trustworthy NLP deployment. Method: This study systematically evaluates Flan-T5, BERT-Base, and RoBERTa-Base under two state-of-the-art white-box adversarial attacks—TextFooler and BERTAttack—using semantically preserved adversarial examples and accuracy drop as the primary robustness metric. Contribution/Results: RoBERTa-Base and Flan-T5 exhibit exceptional robustness (0% attack success rate), whereas BERT-Base suffers catastrophic degradation—accuracy plunges from 48% to 3% under TextFooler—revealing severe vulnerability. The work identifies pre-training objectives and architectural design as decisive factors governing textual robustness, challenging the assumption that model scale alone ensures resilience. It further highlights the high computational cost inherent in current robust models and proposes directions for lightweight, efficient defense strategies. These findings establish a reproducible benchmark and mechanistic insights for evaluating large language model security.

Technology Category

Application Category

📝 Abstract
This study evaluates the resilience of large language models (LLMs) against adversarial attacks, specifically focusing on Flan-T5, BERT, and RoBERTa-Base. Using systematically designed adversarial tests through TextFooler and BERTAttack, we found significant variations in model robustness. RoBERTa-Base and FlanT5 demonstrated remarkable resilience, maintaining accuracy even when subjected to sophisticated attacks, with attack success rates of 0%. In contrast. BERT-Base showed considerable vulnerability, with TextFooler achieving a 93.75% success rate in reducing model accuracy from 48% to just 3%. Our research reveals that while certain LLMs have developed effective defensive mechanisms, these safeguards often require substantial computational resources. This study contributes to the understanding of LLM security by identifying existing strengths and weaknesses in current safeguarding approaches and proposes practical recommendations for developing more efficient and effective defensive strategies.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM resilience against adversarial text attacks
Assessing robustness variations across Flan-T5, BERT, RoBERTa models
Identifying computational requirements for effective defensive mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLM robustness via adversarial attacks
Testing models with TextFooler and BERTAttack methods
Proposing efficient defensive strategies for LLMs
🔎 Similar Papers
No similar papers found.
T
Taniya Gidatkar
Department of Computing and Mathematics, Manchester Metropolitan University, United Kingdom
O
Oluwaseun Ajao
Department of Computing and Mathematics, Manchester Metropolitan University, United Kingdom
Matthew Shardlow
Matthew Shardlow
Reader in Natural Language Processing, Manchester Metropolitan University
Natural Language ProcessingText SimplificationLexical Complexity Prediction