🤖 AI Summary
Existing TCR–pMHC binding prediction methods rely on black-box Transformer models, lacking biological interpretability; post-hoc explanation techniques fail to explicitly encode established biochemical mechanisms—such as known critical binding regions.
Method: We propose a “design-for-explainability” architecture that pioneers the integration of a biologically grounded, interpretable prototype layer—incorporating structural and functional priors—to explicitly model residue-specific interactions between TCR and pMHC. Our approach synergistically combines protein language models, attention mechanisms, and biophysical constraints to jointly perform contact site identification and semantic understanding.
Contribution: The model achieves state-of-the-art predictive accuracy on large-scale benchmark datasets and significantly outperforms existing methods in explainability on the TCR-XAI benchmark. To our knowledge, it is the first method to unify high prediction fidelity with mechanism-driven, biologically grounded interpretability.
📝 Abstract
T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.