Accelerating MHC-II Epitope Discovery via Multi-Scale Prediction in Antigen Presentation

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing core challenges in MHC-II epitope prediction—including complex binding specificity, ambiguous sequence motifs, scarcity of high-quality data, and insufficient standardization—this study constructs the first standardized, biology-enriched, multi-source MHC-II antigenic epitope dataset. We propose a novel hierarchical prediction framework covering three biologically grounded tasks: peptide–MHC binding, peptide presentation, and antigen presentation—integrated via multi-task learning, modular deep architecture, and cross-scale modeling of immunological processes. A unified, multi-scale evaluation benchmark is established to enable rigorous and comparable assessment. Our approach achieves significant improvements in prediction accuracy across all tasks. This work delivers a scalable, high-quality data resource, a principled methodological foundation, and a new AI-driven paradigm for personalized vaccine design in computational immuno-oncology.

Technology Category

Application Category

📝 Abstract
Antigenic epitope presented by major histocompatibility complex II (MHC-II) proteins plays an essential role in immunotherapy. However, compared to the more widely studied MHC-I in computational immunotherapy, the study of MHC-II antigenic epitope poses significantly more challenges due to its complex binding specificity and ambiguous motif patterns. Consequently, existing datasets for MHC-II interactions are smaller and less standardized than those available for MHC-I. To address these challenges, we present a well-curated dataset derived from the Immune Epitope Database (IEDB) and other public sources. It not only extends and standardizes existing peptide-MHC-II datasets, but also introduces a novel antigen-MHC-II dataset with richer biological context. Leveraging this dataset, we formulate three major machine learning (ML) tasks of peptide binding, peptide presentation, and antigen presentation, which progressively capture the broader biological processes within the MHC-II antigen presentation pathway. We further employ a multi-scale evaluation framework to benchmark existing models, along with a comprehensive analysis over various modeling designs to this problem with a modular framework. Overall, this work serves as a valuable resource for advancing computational immunotherapy, providing a foundation for future research in ML guided epitope discovery and predictive modeling of immune responses.
Problem

Research questions and friction points this paper is trying to address.

Develops a curated dataset for MHC-II epitope prediction
Formulates three ML tasks to model antigen presentation
Provides a multi-scale framework to benchmark existing models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated dataset from IEDB and public sources
Three ML tasks for peptide binding and presentation
Multi-scale evaluation framework with modular design
🔎 Similar Papers
No similar papers found.
Y
Yue Wan
Department of Computer Science, University of Pittsburgh
Jiayi Yuan
Jiayi Yuan
Rice University
Machine LearningLarge Language Models
Z
Zhiwei Feng
School of Pharmacy, University of Pittsburgh
X
Xiaowei Jia
Department of Computer Science, University of Pittsburgh