Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Genomic language models (gLMs) are increasingly deployed in biomedical research, yet their biosecurity risks—particularly regarding adversarial fine-tuning to generate pathogenic viral sequences—remain poorly understood, especially when training data undergo pathogen filtering. Method: We investigate the safety robustness of gLMs post-data filtering (e.g., removal of viral sequences) by performing minimal adversarial fine-tuning on the Evo 2 model using only 110 human-pathogenic viral sequences. Contribution/Results: Despite complete exclusion of pathogens from pretraining data, the fine-tuned model exhibits significantly reduced perplexity on held-out pathogenic viruses and successfully identifies SARS-CoV-2 immune-escape variants (AUROC = 0.6). This constitutes the first empirical demonstration that data filtering alone is insufficient to mitigate biosecurity risks in gLMs: malicious fine-tuning can rapidly restore pathogen modeling and predictive capabilities for human-infecting viruses. Our findings expose a critical gap in current safety governance frameworks for genomic AI.

Technology Category

Application Category

📝 Abstract

Novel deep learning architectures are increasingly being applied to biological data, including genetic sequences. These models, referred to as genomic language mod- els (gLMs), have demonstrated impressive predictive and generative capabilities, raising concerns that such models may also enable misuse, for instance via the generation of genomes for human-infecting viruses. These concerns have catalyzed calls for risk mitigation measures. The de facto mitigation of choice is filtering of pretraining data (i.e., removing viral genomic sequences from training datasets) in order to limit gLM performance on virus-related tasks. However, it is not currently known how robust this approach is for securing open-source models that can be fine-tuned using sensitive pathogen data. Here, we evaluate a state-of-the-art gLM, Evo 2, and perform fine-tuning using sequences from 110 harmful human-infecting viruses to assess the rescue of misuse-relevant predictive capabilities. The fine- tuned model exhibited reduced perplexity on unseen viral sequences relative to 1) the pretrained model and 2) a version fine-tuned on bacteriophage sequences. The model fine-tuned on human-infecting viruses also identified immune escape variants from SARS-CoV-2 (achieving an AUROC of 0.6), despite having no expo- sure to SARS-CoV-2 sequences during fine-tuning. This work demonstrates that data exclusion might be circumvented by fine-tuning approaches that can, to some degree, rescue misuse-relevant capabilities of gLMs. We highlight the need for safety frameworks for gLMs and outline further work needed on evaluations and mitigation measures to enable the safe deployment of gLMs.

Problem

Research questions and friction points this paper is trying to address.

Assessing robustness of genomic language model safeguards against adversarial fine-tuning

Evaluating if data exclusion prevents misuse capabilities in open-source genomic models

Testing whether fine-tuning can rescue harmful predictive capabilities in biological AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial fine-tuning assesses genomic model robustness

Fine-tuning rescues misuse capabilities despite data exclusion

Evaluates safety frameworks for genomic language models

🔎 Similar Papers

Tamper-Resistant Safeguards for Open-Weight LLMs