TCR-EML: Explainable Model Layers for TCR-pMHC Prediction

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing TCR–pMHC binding prediction methods rely on black-box Transformer models, lacking biological interpretability; post-hoc explanation techniques fail to explicitly encode established biochemical mechanisms—such as known critical binding regions. Method: We propose a “design-for-explainability” architecture that pioneers the integration of a biologically grounded, interpretable prototype layer—incorporating structural and functional priors—to explicitly model residue-specific interactions between TCR and pMHC. Our approach synergistically combines protein language models, attention mechanisms, and biophysical constraints to jointly perform contact site identification and semantic understanding. Contribution: The model achieves state-of-the-art predictive accuracy on large-scale benchmark datasets and significantly outperforms existing methods in explainability on the TCR-XAI benchmark. To our knowledge, it is the first method to unify high prediction fidelity with mechanism-driven, biologically grounded interpretability.

Technology Category

Application Category

📝 Abstract

T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Predicting TCR-pMHC binding using explainable machine learning models

Addressing black-box limitations in current TCR recognition prediction methods

Incorporating biochemical binding mechanisms into interpretable model architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable model layers for TCR-pMHC binding prediction

Prototype layers based on known biochemical binding mechanisms

Integration with protein-language model backbones for accuracy

🔎 Similar Papers

tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity