Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack

πŸ“… 2025-06-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the evaluation of adversarial robustness in fake news classification systems by proposing a semantics-preserving, word-level adversarial attack method. Building upon BeamAttack, we innovatively integrate word deletion, skip-substitution mechanisms, and LIME-based interpretability guidance to improve the accuracy of perturbation prioritization; we further employ Beam Search to efficiently identify minimal-perturbation attack paths. The method is evaluated on BiLSTM, BERT, and adversarially trained RoBERTa across multiple benchmark datasets, achieving over 99% attack success rates while maintaining high semantic consistency and lexical similarity (measured via BLEU and Word Mover’s Distance). Our approach achieves both high attack efficacy and strong interpretability, offering a practical, reliable tool for robustness assessment of text classification models.

Technology Category

Application Category

πŸ“ Abstract
We extend BeamAttack, an adversarial attack algorithm designed to evaluate the robustness of text classification systems through word-level modifications guided by beam search. Our extensions include support for word deletions and the option to skip substitutions, enabling the discovery of minimal modifications that alter model predictions. We also integrate LIME to better prioritize word replacements. Evaluated across multiple datasets and victim models (BiLSTM, BERT, and adversarially trained RoBERTa) within the BODEGA framework, our approach achieves over a 99% attack success rate while preserving the semantic and lexical similarity of the original texts. Through both quantitative and qualitative analysis, we highlight BeamAttack's effectiveness and its limitations. Our implementation is available at https://github.com/LucK1Y/BeamAttack
Problem

Research questions and friction points this paper is trying to address.

Evaluates robustness of text classification to adversarial word modifications
Discovers minimal text changes that alter model predictions
Tests attack success rate while preserving text similarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends BeamAttack with word deletions
Integrates LIME for word replacements
Achieves 99% attack success rate
πŸ”Ž Similar Papers
No similar papers found.