🤖 AI Summary
This study addresses the zero-shot prediction of protein thermodynamic stability. We reveal that the amino acid preferences of inverse folding models (e.g., ProteinMPNN) fundamentally reflect sequence contributions to folding free energy. Departing from prior work that approximates log-likelihood ratios as free energy differences heuristically, we derive—based on rigorous statistical mechanics—a physically grounded evaluation pathway centered on free energy differences, enabling physical consistency recalibration of model outputs. Through lightweight post-hoc correction alone, our method significantly improves zero-shot prediction performance across multiple benchmarks—including ThermoNet, DeepDDG, and ProTherm—achieving average Spearman correlation gains of 0.12–0.23. Crucially, it establishes, for the first time, an interpretable and quantifiable physical linkage between inverse folding model outputs and thermodynamic stability. This advances protein design in data-scarce regimes by introducing a new physics-informed paradigm.
📝 Abstract
Inverse folding models have proven to be highly effective zero-shot predictors of protein stability. Despite this success, the link between the amino acid preferences of an inverse folding model and the free-energy considerations underlying thermodynamic stability remains incompletely understood. A better understanding would be of interest not only from a theoretical perspective, but also potentially provide the basis for stronger zero-shot stability prediction. In this paper, we take steps to clarify the free-energy foundations of inverse folding models. Our derivation reveals the standard practice of likelihood ratios as a simplistic approximation and suggests several paths towards better estimates of the relative stability. We empirically assess these approaches and demonstrate that considerable gains in zero-shot performance can be achieved with fairly simple means.