Inference-Time Toxicity Mitigation in Protein Language Models

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the underappreciated biosecurity risk that fine-tuned protein language models may inadvertently generate toxic sequences. To mitigate this without retraining, the study introduces Logit Difference Amplification (LDA), a novel inference-time mechanism that effectively suppresses toxicity generation. Evaluated across four taxonomic groups, LDA significantly reduces predicted toxicity—as measured by ToxDL2—while preserving the biological plausibility and structural foldability of generated sequences, as confirmed by Fréchet ESM Distance and pLDDT metrics. The approach outperforms existing activation-based guidance strategies, demonstrating both efficacy and innovation as a safety control during inference.

Technology Category

Application Category

📝 Abstract
Protein language models (PLMs) are becoming practical tools for de novo protein design, yet their dual-use potential raises safety concerns. We show that domain adaptation to specific taxonomic groups can elicit toxic protein generation, even when toxicity is not the training objective. To address this, we adapt Logit Diff Amplification (LDA) as an inference-time control mechanism for PLMs. LDA modifies token probabilities by amplifying the logit difference between a baseline model and a toxicity-finetuned model, requiring no retraining. Across four taxonomic groups, LDA consistently reduces predicted toxicity rate (measured via ToxDL2) below the taxon-finetuned baseline while preserving biological plausibility. We evaluate quality using Fréchet ESM Distance and predicted foldability (pLDDT), finding that LDA maintains distributional similarity to natural proteins and structural viability (unlike activation-based steering methods that tend to degrade sequence properties). Our results demonstrate that LDA provides a practical safety knob for protein generators that mitigates elicited toxicity while retaining generative quality.
Problem

Research questions and friction points this paper is trying to address.

Protein language models
toxicity mitigation
dual-use risk
de novo protein design
inference-time control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Logit Diff Amplification
inference-time control
protein language models
toxicity mitigation
dual-use safety
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
M
Manuel Fernández Burda
Laboratory of Applied Artificial Intelligence (LIAA), Institute of Computer Sciences (ICC), CONICET - Universidad de Buenos Aires
Santiago Aranguri
Santiago Aranguri
PhD Student, NYU Courant
I
Iván Arcuschin Moreno
AI Safety Argentina (AISAR)
Enzo Ferrante
Enzo Ferrante
CONICET & Universidad de Buenos Aires
Medical ImagingMachine LearningComputer VisionML Fairness