Exploring Adversarial Robustness in Classification tasks using DNA Language Models

📅 2024-09-29

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work systematically evaluates the adversarial robustness of DNA language models (e.g., DNABERT2, Nucleotide Transformer) on classification tasks under biologically realistic perturbations—including sequencing errors, mutations, and experimental noise—modeled at multiple granularities: nucleotide-level substitutions, codon-level edits, and sequence-level back-translation transformations. Method: We introduce the first cross-granularity adversarial evaluation framework for DNA modeling, spanning character-, token-, and sequence-level attacks, and propose biologically informed back-translation–based adversarial attacks and corresponding adversarial training strategies tailored to the genetic code. Contribution/Results: Experiments reveal that state-of-the-art models exhibit significant vulnerability to these biologically grounded attacks, suffering substantial accuracy degradation. In contrast, integrating biology-aware adversarial training markedly improves model robustness while simultaneously boosting classification accuracy—demonstrating that robustness and generalization can be jointly enhanced in DNA sequence modeling.

Technology Category

Application Category

📝 Abstract

DNA Language Models, such as GROVER, DNABERT2 and the Nucleotide Transformer, operate on DNA sequences that inherently contain sequencing errors, mutations, and laboratory-induced noise, which may significantly impact model performance. Despite the importance of this issue, the robustness of DNA language models remains largely underexplored. In this paper, we comprehensivly investigate their robustness in DNA classification by applying various adversarial attack strategies: the character (nucleotide substitutions), word (codon modifications), and sentence levels (back-translation-based transformations) to systematically analyze model vulnerabilities. Our results demonstrate that DNA language models are highly susceptible to adversarial attacks, leading to significant performance degradation. Furthermore, we explore adversarial training method as a defense mechanism, which enhances both robustness and classification accuracy. This study highlights the limitations of DNA language models and underscores the necessity of robustness in bioinformatics.

Problem

Research questions and friction points this paper is trying to address.

Investigates robustness of DNA language models in classification tasks.

Analyzes vulnerabilities to adversarial attacks at multiple levels.

Explores adversarial training to improve model robustness and accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial attacks on DNA language models

Systematic vulnerability analysis at multiple levels

Adversarial training enhances model robustness

🔎 Similar Papers

DiffuseDef: Improved Robustness to Adversarial Attacks