🤖 AI Summary
Addressing core challenges in MHC-II epitope prediction—including complex binding specificity, ambiguous sequence motifs, scarcity of high-quality data, and insufficient standardization—this study constructs the first standardized, biology-enriched, multi-source MHC-II antigenic epitope dataset. We propose a novel hierarchical prediction framework covering three biologically grounded tasks: peptide–MHC binding, peptide presentation, and antigen presentation—integrated via multi-task learning, modular deep architecture, and cross-scale modeling of immunological processes. A unified, multi-scale evaluation benchmark is established to enable rigorous and comparable assessment. Our approach achieves significant improvements in prediction accuracy across all tasks. This work delivers a scalable, high-quality data resource, a principled methodological foundation, and a new AI-driven paradigm for personalized vaccine design in computational immuno-oncology.
📝 Abstract
Antigenic epitope presented by major histocompatibility complex II (MHC-II) proteins plays an essential role in immunotherapy. However, compared to the more widely studied MHC-I in computational immunotherapy, the study of MHC-II antigenic epitope poses significantly more challenges due to its complex binding specificity and ambiguous motif patterns. Consequently, existing datasets for MHC-II interactions are smaller and less standardized than those available for MHC-I. To address these challenges, we present a well-curated dataset derived from the Immune Epitope Database (IEDB) and other public sources. It not only extends and standardizes existing peptide-MHC-II datasets, but also introduces a novel antigen-MHC-II dataset with richer biological context. Leveraging this dataset, we formulate three major machine learning (ML) tasks of peptide binding, peptide presentation, and antigen presentation, which progressively capture the broader biological processes within the MHC-II antigen presentation pathway. We further employ a multi-scale evaluation framework to benchmark existing models, along with a comprehensive analysis over various modeling designs to this problem with a modular framework. Overall, this work serves as a valuable resource for advancing computational immunotherapy, providing a foundation for future research in ML guided epitope discovery and predictive modeling of immune responses.