A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional variant effect prediction methods suffer from limited performance on drug target genes exhibiting weak evolutionary constraints, where multiple sequence alignments (MSAs) provide insufficient information. Method: We propose matVAE—a matrix variational autoencoder that integrates structured priors with AlphaFold-predicted 3D structural features and employs a Transformer architecture to model residue co-evolutionary dependencies. Crucially, matVAE operates in a zero-shot setting, requiring only deep mutational scanning (DMS) data and no MSA input. Results: In zero-shot evaluation, matVAE-MSA significantly outperforms DeepSequence while reducing parameter count by 90% and accelerating inference. When augmented with AlphaFold structural embeddings, matVAE matches the performance of fully fine-tuned models. This work is the first to empirically demonstrate that DMS data can effectively substitute for MSAs, establishing an efficient new paradigm for functional interpretation of variants in low-conservation drug target proteins.

Technology Category

Application Category

📝 Abstract
Variant effect predictors (VEPs) aim to assess the functional impact of protein variants, traditionally relying on multiple sequence alignments (MSAs). This approach assumes that naturally occurring variants are fit, an assumption challenged by pharmacogenomics, where some pharmacogenes experience low evolutionary pressure. Deep mutational scanning (DMS) datasets provide an alternative by offering quantitative fitness scores for variants. In this work, we propose a transformer-based matrix variational auto-encoder (matVAE) with a structured prior and evaluate its performance on 33 DMS datasets corresponding to 26 drug target and ADME proteins from the ProteinGym benchmark. Our model trained on MSAs (matVAE-MSA) outperforms the state-of-the-art DeepSequence model in zero-shot prediction on DMS datasets, despite using an order of magnitude fewer parameters and requiring less computation at inference time. We also compare matVAE-MSA to matENC-DMS, a model of similar capacity trained on DMS data, and find that the latter performs better on supervised prediction tasks. Additionally, incorporating AlphaFold-generated structures into our transformer model further improves performance, achieving results comparable to DeepSequence trained on MSAs and finetuned on DMS. These findings highlight the potential of DMS datasets to replace MSAs without significant loss in predictive performance, motivating further development of DMS datasets and exploration of their relationships to enhance variant effect prediction.
Problem

Research questions and friction points this paper is trying to address.

Predicts functional impact of protein variants in pharmacogenes
Compares MSA and DMS data for variant effect prediction
Improves prediction using transformer-based models and AlphaFold structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based matrix variational auto-encoder
Structured prior for variant effect prediction
Incorporates AlphaFold structures for performance boost
🔎 Similar Papers
No similar papers found.
A
Antoine Honoré
Division of Information Science and Engineering, KTH, Stockholm, Sweden
Borja Rodríguez Gálvez
Borja Rodríguez Gálvez
Researcher, KTH Royal Institute of Technology
machine learninginformation theorygeneralizationprivacyfairness
Y
Yoomi Park
Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
Y
Yitian Zhou
Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
V
Volker M. Lauschke
Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
Ming Xiao
Ming Xiao
Professor, KTH
Network and Channel CodingWireless CommunicationsMachine Learning