🤖 AI Summary
This study addresses the challenge of efficiently and accurately predicting mutation-induced changes in protein–protein binding free energy (ΔΔG), where existing deep learning methods suffer from limited accuracy and heavy reliance on large-scale labeled datasets. We propose StaB-ddG, a novel framework that leverages the zero-shot capability of a pretrained inverse-folding model to formulate ΔΔG as the difference between the folding energies of the complex and its constituent monomers. By transferring knowledge from abundant protein folding stability prediction data and fine-tuning on scarce experimental binding data, StaB-ddG achieves accuracy comparable to the empirical force-field tool FoldX (Pearson *r* ≈ 0.65), while accelerating inference by over 1000×. Its key contribution lies in establishing a physically inspired, folding-energy-driven representation paradigm—significantly reducing dependence on binding-specific annotations and enabling high-accuracy, high-throughput analysis of mutational effects.
📝 Abstract
Accurate estimation of mutational effects on protein-protein binding energies is an open problem with applications in structural biology and therapeutic design. Several deep learning predictors for this task have been proposed, but, presumably due to the scarcity of binding data, these methods underperform computationally expensive estimates based on empirical force fields. In response, we propose a transfer-learning approach that leverages advances in protein sequence modeling and folding stability prediction for this task. The key idea is to parameterize the binding energy as the difference between the folding energy of the protein complex and the sum of the folding energies of its binding partners. We show that using a pre-trained inverse-folding model as a proxy for folding energy provides strong zero-shot performance, and can be fine-tuned with (1) copious folding energy measurements and (2) more limited binding energy measurements. The resulting predictor, StaB-ddG, is the first deep learning predictor to match the accuracy of the state-of-the-art empirical force-field method FoldX, while offering an over 1,000x speed-up.