PLM-eXplain: Divide and Conquer the Protein Embedding Space

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Protein language models (PLMs) exhibit strong predictive performance but suffer from limited biological interpretability, hindering their utility in mechanistic analysis and translational applications in computational biology. To address this, we propose PLM-X, an interpretable adapter framework that orthogonally decomposes the PLM embedding space into two subspaces: a biochemically grounded prior subspace—encoding domain-knowledge features such as secondary structure propensity and hydrophobicity—and a residual predictive subspace. Leveraging a lightweight adapter architecture, biochemistry-informed orthogonal projection, and multi-task fine-tuning, PLM-X enables faithful decision attribution without sacrificing predictive accuracy. Evaluated on three biologically significant tasks—extracellular vesicle association prediction, transmembrane helix identification, and aggregation propensity estimation—PLM-X achieves state-of-the-art performance while generating intuitive, experimentally verifiable biological interpretations of model decisions.

Technology Category

Application Category

📝 Abstract
Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.
Problem

Research questions and friction points this paper is trying to address.

Bridging gap between protein language models and biological interpretation
Factoring PLM embeddings into interpretable and residual components
Enhancing PLM interpretability without sacrificing predictive accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable adapter layer for PLM embeddings
Factor embeddings into interpretable and residual subspaces
Maintains performance while enabling biological interpretation
🔎 Similar Papers
No similar papers found.
J
Jan van Eck
AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University, Utrecht, Netherlands
D
Dea Gogishvili
AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University, Utrecht, Netherlands
Wilson Silva
Wilson Silva
Assistant Professor, AI Technology for Life, Utrecht University
Machine LearningComputer VisionExplainable AIMedical Image AnalysisPrivacy
Sanne Abeln
Sanne Abeln
Professor of AI Technology for Life, Utrecht University
AI for the Life SciencesProtein BioinformaticsGenomic AlterationsNeurodegenerative Disease.