Inference-time optimization for experiment-grounded protein ensemble generation

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current protein generation models are constrained by fixed sampling step sizes and sensitivity to initial structures, hindering the production of thermodynamically plausible ensembles that align with experimental data. This work proposes an inference-time latent-space optimization framework that maximizes the log-likelihood of Boltzmann-weighted conformational ensembles, integrating AlphaFold3 structural priors, physical force fields, and experimental restraints. The approach enables flexible sampling independent of diffusion step counts or specific initializations. Evaluated on X-ray and NMR data, it significantly outperforms existing methods in conformational diversity, energetic plausibility, and experimental consistency—sometimes surpassing archived PDB structures. Additionally, the study reveals that commonly used metrics such as ipTM can be misleading under embedding perturbations, exposing critical vulnerabilities in current evaluation protocols.

Technology Category

Application Category

📝 Abstract
Protein function relies on dynamic conformational ensembles, yet current generative models like AlphaFold3 often fail to produce ensembles that match experimental data. Recent experiment-guided generators attempt to address this by steering the reverse diffusion process. However, these methods are limited by fixed sampling horizons and sensitivity to initialization, often yielding thermodynamically implausible results. We introduce a general inference-time optimization framework to solve these challenges. First, we optimize over latent representations to maximize ensemble log-likelihood, rather than perturbing structures post hoc. This approach eliminates dependence on diffusion length, removes initialization bias, and easily incorporates external constraints. Second, we present novel sampling schemes for drawing Boltzmann-weighted ensembles. By combining structural priors from AlphaFold3 with force-field-based priors, we sample from their product distribution while balancing experimental likelihoods. Our results show that this framework consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures. Finally, inference-time optimization experiments maximizing ipTM scores reveal that perturbing AlphaFold3 embeddings can artificially inflate model confidence. This exposes a vulnerability in current design metrics, whose mitigation could offer a pathway to reduce false discovery rates in binder engineering.
Problem

Research questions and friction points this paper is trying to address.

protein ensemble generation
experimental data agreement
conformational dynamics
thermodynamic plausibility
generative modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

inference-time optimization
protein ensemble generation
experiment-guided sampling
Boltzmann-weighted ensembles
latent space optimization
🔎 Similar Papers
No similar papers found.