Exploring zero-shot structure-based protein fitness prediction

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Computational structural predictions often introduce bias in fitness prediction for variants in intrinsically disordered regions (IDRs), undermining reliability. Method: We propose a zero-shot, fine-tuning-free multimodal framework that jointly leverages pretrained protein language models (e.g., ESM) and AlphaFold2-predicted structures—while critically emphasizing the necessity of experimentally determined structures for accurate modeling. A simple yet effective multimodal integration enables sequence–structure co-modeling. Contributions/Results: (1) We systematically demonstrate that computational structure prediction exacerbates bias in IDRs and establish that high-resolution experimental structures are decisive for robust fitness prediction; (2) we identify structural quality and local disorder-to-order propensity as key determinants of zero-shot performance; (3) on the ProteinGym benchmark, our lightweight multimodal ensemble establishes a strong, plug-and-play zero-shot baseline—enabling reliable interpretation of genetic variants and supporting rational protein engineering without task-specific adaptation.

Technology Category

Application Category

📝 Abstract
The ability to make zero-shot predictions about the fitness consequences of protein sequence changes with pre-trained machine learning models enables many practical applications. Such models can be applied for downstream tasks like genetic variant interpretation and protein engineering without additional labeled data. The advent of capable protein structure prediction tools has led to the availability of orders of magnitude more precomputed predicted structures, giving rise to powerful structure-based fitness prediction models. Through our experiments, we assess several modeling choices for structure-based models and their effects on downstream fitness prediction. Zero-shot fitness prediction models can struggle to assess the fitness landscape within disordered regions of proteins, those that lack a fixed 3D structure. We confirm the importance of matching protein structures to fitness assays and find that predicted structures for disordered regions can be misleading and affect predictive performance. Lastly, we evaluate an additional structure-based model on the ProteinGym substitution benchmark and show that simple multi-modal ensembles are strong baselines.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot prediction of protein fitness changes using pre-trained models
Impact of protein structure modeling on fitness prediction accuracy
Challenges in assessing fitness within disordered protein regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot protein fitness prediction with pre-trained models
Structure-based models using predicted protein structures
Multi-modal ensembles for improved predictive performance
🔎 Similar Papers
No similar papers found.
A
Arnav Sharma
Department of Computer Sciences, Morgridge Institute for Research, University of Wisconsin-Madison
Anthony Gitter
Anthony Gitter
Associate Professor, University of Wisconsin-Madison; Morgridge Institute for Research
Computational biologyBioinformatics