GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

DNA foundation models hold significant promise for synthetic biology, yet their vulnerability to jailbreak attacks poses critical biosafety risks—potentially enabling generation of pathogenic sequences (e.g., viral genes). Method: We introduce the first structured jailbreak evaluation framework tailored to DNA language models, integrating pathogen-directed prompting, high-homology prompt generation, and BLAST-driven empirical validation. Our framework unifies LLM agents, PathoLM pathogenicity scoring, log-probability beam search, and the JailbreakDNABench benchmark suite. Contribution/Results: Applied to six viral families, our framework achieves up to 60% jailbreak success on Evo-series models, generating sequences with protein-level and structural fidelity to SARS-CoV-2 and HIV-1. Crucially, we empirically demonstrate that scaling model size markedly amplifies dual-use risk. This work establishes the first systematic methodology and empirical foundation for governing the biosafety of DNA foundation models.

Technology Category

Application Category

📝 Abstract

DNA, encoding genetic instructions for almost all living organisms, fuels groundbreaking advances in genomics and synthetic biology. Recently, DNA Foundation Models have achieved success in designing synthetic functional DNA sequences, even whole genomes, but their susceptibility to jailbreaking remains underexplored, leading to potential concern of generating harmful sequences such as pathogens or toxin-producing genes. In this paper, we introduce GeneBreaker, the first framework to systematically evaluate jailbreak vulnerabilities of DNA foundation models. GeneBreaker employs (1) an LLM agent with customized bioinformatic tools to design high-homology, non-pathogenic jailbreaking prompts, (2) beam search guided by PathoLM and log-probability heuristics to steer generation toward pathogen-like sequences, and (3) a BLAST-based evaluation pipeline against a curated Human Pathogen Database (JailbreakDNABench) to detect successful jailbreaks. Evaluated on our JailbreakDNABench, GeneBreaker successfully jailbreaks the latest Evo series models across 6 viral categories consistently (up to 60% Attack Success Rate for Evo2-40B). Further case studies on SARS-CoV-2 spike protein and HIV-1 envelope protein demonstrate the sequence and structural fidelity of jailbreak output, while evolutionary modeling of SARS-CoV-2 underscores biosecurity risks. Our findings also reveal that scaling DNA foundation models amplifies dual-use risks, motivating enhanced safety alignment and tracing mechanisms. Our code is at https://github.com/zaixizhang/GeneBreaker.

Problem

Research questions and friction points this paper is trying to address.

Evaluating jailbreak vulnerabilities in DNA foundation models

Preventing generation of harmful pathogen-like DNA sequences

Assessing biosecurity risks from scaled DNA model outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agent designs non-pathogenic jailbreaking prompts

Beam search steers pathogen-like sequence generation

BLAST pipeline detects successful jailbreak sequences

🔎 Similar Papers

Exploring Adversarial Robustness in Classification tasks using DNA Language Models

2024-09-29Citations: 0

Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks

2024-07-01Conference on Empirical Methods in Natural Language ProcessingCitations: 2

Uber

For New York, NY-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For San Francisco, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Seattle, WA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Sunnyvale, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year.

New York, NY, USA / San Francisco, CA, USA / Seattle, WA, USA

Authors to Follow