Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

📅 2024-07-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current anticancer peptide (ACP) identification methods suffer from weak sequence feature representation and poor interpretability, hindering efficient screening. To address this, we propose a novel topological representation framework for peptide sequences, introducing— for the first time—“connectivity relation vectors” and “spectral topological descriptors” that jointly encode local connectivity patterns and global spectral properties. This yields a lightweight, highly interpretable machine learning model. By integrating topological feature engineering with an Extra-Trees ensemble classifier, our approach achieves state-of-the-art (SOTA) or top-tier performance on the AntiCP 2.0 and mACPpred 2.0 benchmarks, significantly improving both classification accuracy and generalization capability. Crucially, the model’s design ensures mechanistic interpretability without sacrificing predictive power. Our work establishes a new paradigm for ACP discovery that simultaneously delivers high performance and biologically meaningful insights.

Technology Category

Application Category

📝 Abstract
Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptides prediction. Our Top-ML employs peptide topological features derived from its sequence"connection"information characterized by vector and spectral descriptors. Our Top-ML model, employing an Extra-Trees classifier, has been validated on the AntiCP 2.0 and mACPpred 2.0 benchmark datasets, achieving state-of-the-art performance or results comparable to existing deep learning models, while providing greater interpretability. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.
Problem

Research questions and friction points this paper is trying to address.

Cancer Peptide Identification
Feature Understanding
Efficient Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Top-ML
Cancer-fighting Peptides
Transparent Prediction Logic
🔎 Similar Papers
No similar papers found.
J
Joshua Zhi En Tan
Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
J
Junjie Wee
Michigan State University, Department of Mathematics, East Lansing, MI 48824, USA; Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
Xue Gong
Xue Gong
UC Berkeley
Neuroscience
Kelin Xia
Kelin Xia
Associate Professor, School of Physical & Mathematical Sciences, Nanyang Technological University
Topological data analysisGeometric data analysisTopological deep learningMathematical AI