KEMP-PIP: A Feature-Fusion Based Approach for Pro-inflammatory Peptide Prediction

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the high cost and low efficiency of experimental identification of pro-inflammatory peptides (PIPs) by proposing a hybrid machine learning framework that integrates deep protein embeddings with handcrafted features. The approach combines ESM pre-trained protein language model embeddings, multi-scale k-mer frequencies, physicochemical properties, and modlAMP-derived sequence features. To mitigate high-dimensional sparsity and class imbalance, the method employs feature pruning, class-weighted logistic regression, and ensemble averaging. Evaluated on standard benchmarks, the model achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762—outperforming the current state-of-the-art method, StackPIP, with relative improvements of 9.5% in MCC and 4.8% in both accuracy and AUC—demonstrating significantly enhanced predictive performance for PIPs.

Technology Category

Application Category

📝 Abstract
Pro-inflammatory peptides (PIPs) play critical roles in immune signaling and inflammation but are difficult to identify experimentally due to costly and time-consuming assays. To address this challenge, we present KEMP-PIP, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction. Our approach combines contextual embeddings from pretrained ESM protein language models with multi-scale k-mer frequencies, physicochemical descriptors, and modlAMP sequence features. Feature pruning and class-weighted logistic regression manage high dimensionality and class imbalance, while ensemble averaging with an optimized decision threshold enhances the sensitivity--specificity balance. Through systematic ablation studies, we demonstrate that integrating complementary feature sets consistently improves predictive performance. On the standard benchmark dataset, KEMP-PIP achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762, outperforming ProIn-fuse, MultiFeatVotPIP, and StackPIP. Relative to StackPIP, these results represent improvements of 9.5% in MCC and 4.8% in both accuracy and AUC. The KEMP-PIP web server is freely available at https://nilsparrow1920-kemp-pip.hf.space/ and the full implementation at https://github.com/S18-Niloy/KEMP-PIP.
Problem

Research questions and friction points this paper is trying to address.

pro-inflammatory peptides
PIP prediction
feature fusion
machine learning
class imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature fusion
protein language model
pro-inflammatory peptide prediction
ensemble learning
class imbalance
🔎 Similar Papers
No similar papers found.
S
Soumik Deb Niloy
Department of Computer Science and Engineering, BRAC University, Dhaka 1212, Bangladesh
M
Md. Fahmid-Ul-Alam Juboraj
Department of Computer Science and Engineering, BRAC University, Dhaka 1212, Bangladesh
Swakkhar Shatabda
Swakkhar Shatabda
Professor, School of Data and Sciences, BRAC University
optimizationmachine learningcomputational biologybioinformatics