KEMP-PIP: A Feature-Fusion Based Approach for Pro-inflammatory Peptide Prediction

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This study addresses the high cost and low efficiency of experimental identification of pro-inflammatory peptides (PIPs) by proposing a hybrid machine learning framework that integrates deep protein embeddings with handcrafted features. The approach combines ESM pre-trained protein language model embeddings, multi-scale k-mer frequencies, physicochemical properties, and modlAMP-derived sequence features. To mitigate high-dimensional sparsity and class imbalance, the method employs feature pruning, class-weighted logistic regression, and ensemble averaging. Evaluated on standard benchmarks, the model achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762—outperforming the current state-of-the-art method, StackPIP, with relative improvements of 9.5% in MCC and 4.8% in both accuracy and AUC—demonstrating significantly enhanced predictive performance for PIPs.

Technology Category

Application Category

📝 Abstract
Pro-inflammatory peptides (PIPs) play critical roles in immune signaling and inflammation but are difficult to identify experimentally due to costly and time-consuming assays. To address this challenge, we present KEMP-PIP, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction. Our approach combines contextual embeddings from pretrained ESM protein language models with multi-scale k-mer frequencies, physicochemical descriptors, and modlAMP sequence features. Feature pruning and class-weighted logistic regression manage high dimensionality and class imbalance, while ensemble averaging with an optimized decision threshold enhances the sensitivity--specificity balance. Through systematic ablation studies, we demonstrate that integrating complementary feature sets consistently improves predictive performance. On the standard benchmark dataset, KEMP-PIP achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762, outperforming ProIn-fuse, MultiFeatVotPIP, and StackPIP. Relative to StackPIP, these results represent improvements of 9.5% in MCC and 4.8% in both accuracy and AUC. The KEMP-PIP web server is freely available at https://nilsparrow1920-kemp-pip.hf.space/ and the full implementation at https://github.com/S18-Niloy/KEMP-PIP.
Problem

Research questions and friction points this paper is trying to address.

pro-inflammatory peptides
PIP prediction
feature fusion
machine learning
class imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature fusion
protein language model
pro-inflammatory peptide prediction
ensemble learning
class imbalance
💼 Related Jobs
Postdoctoral Fellow – AI/ML Enabled Bioprocess Modeling and Control
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Andover
S
Soumik Deb Niloy
Department of Computer Science and Engineering, BRAC University, Dhaka 1212, Bangladesh
M
Md. Fahmid-Ul-Alam Juboraj
Department of Computer Science and Engineering, BRAC University, Dhaka 1212, Bangladesh
Swakkhar Shatabda
Swakkhar Shatabda
Professor, School of Data and Sciences, BRAC University
optimizationmachine learningcomputational biologybioinformatics