AmpLyze: A Deep Learning Model for Predicting the Hemolytic Concentration

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing antimicrobial peptide (AMP) toxicity prediction models only perform binary classification of hemolytic concentration (HC₅₀) as “toxic” or “non-toxic”, lacking quantitative regression capability for actual HC₅₀ values. To address this, we propose the first interpretable deep learning model that predicts HC₅₀ directly from amino acid sequences alone. Our method integrates residue-level embeddings from ProtT5 and ESM2 with handcrafted sequence descriptors, employs a dual-branch local–global architecture enhanced by cross-attention, and adopts log-cosh loss for robust regression. Interpretability is achieved via gradient-weighted class activation mapping (Grad-CAM) to identify hemolysis-driving residues. On an independent test set, the model achieves a Pearson correlation coefficient of 0.756 and MSE of 0.987—substantially outperforming state-of-the-art baselines. Ablation studies validate the efficacy of each architectural component. This work enables end-to-end quantitative assessment and mechanistic interpretation of AMP hemolytic potency, advancing rational design of safe and effective antimicrobial peptides.

Technology Category

Application Category

📝 Abstract
Red-blood-cell lysis (HC50) is the principal safety barrier for antimicrobial-peptide (AMP) therapeutics, yet existing models only say "toxic" or "non-toxic." AmpLyze closes this gap by predicting the actual HC50 value from sequence alone and explaining the residues that drive toxicity. The model couples residue-level ProtT5/ESM2 embeddings with sequence-level descriptors in dual local and global branches, aligned by a cross-attention module and trained with log-cosh loss for robustness to assay noise. The optimal AmpLyze model reaches a PCC of 0.756 and an MSE of 0.987, outperforming classical regressors and the state-of-the-art. Ablations confirm that both branches are essential, and cross-attention adds a further 1% PCC and 3% MSE improvement. Expected-Gradients attributions reveal known toxicity hotspots and suggest safer substitutions. By turning hemolysis assessment into a quantitative, sequence-based, and interpretable prediction, AmpLyze facilitates AMP design and offers a practical tool for early-stage toxicity screening.
Problem

Research questions and friction points this paper is trying to address.

Predicts actual HC50 value from peptide sequence alone
Explains toxicity-driving residues in antimicrobial peptides
Improves hemolysis assessment with quantitative interpretable predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts HC50 from sequence using deep learning
Combines ProtT5/ESM2 embeddings with sequence descriptors
Uses cross-attention for local-global branch alignment
🔎 Similar Papers
No similar papers found.
P
Peng Qiu
Carnegie Mellon University
H
Hanqi Feng
Carnegie Mellon University
Barnabas Poczos
Barnabas Poczos
Associate professor, Carnegie Mellon University
Artificial IntelligenceMachine LearningStatistics