Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenge in biophysical inverse problems where target protein conformations lie in low-density regions of the prior distribution of diffusion models, causing conventional coordinate-based posterior sampling methods to rely on strong and unstable likelihood guidance, thereby failing to robustly generate structures consistent with experimental constraints. The authors propose EmbedOpt, a novel approach that, for the first time, guides protein diffusion models during inference by optimizing in the conditional embedding space—rather than atomic coordinates—leveraging the rich sequence and co-evolutionary information encoded therein. This method substantially improves robustness to hyperparameter variations (spanning two orders of magnitude), reduces the required number of diffusion steps, outperforms conventional approaches in cryo-EM density fitting, and matches their performance in distance constraint tasks, achieving both high accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

In many biophysical inverse problems, the goal is to generate biomolecular conformations that are both physically plausible and consistent with experimental measurements. As recent sequence-to-structure diffusion models provide powerful data-driven priors, posterior sampling has emerged as a popular framework by guiding atomic coordinates to target conformations using experimental likelihoods. However, when the target lies in a low-density region of the prior, posterior sampling requires aggressive and brittle weighting of the likelihood guidance. Motivated by this limitation, we propose EmbedOpt, an alternative inference-time approach for steering diffusion models to optimize experimental likelihoods in the conditional embedding space. As this space encodes rich sequence and coevolutionary signals, optimizing over it effectively shifts the diffusion prior to align with experimental constraints. We validate EmbedOpt on two benchmarks simulating cryo-electron microscopy map fitting and experimental distance constraints. We show that EmbedOpt outperforms the coordinate-based posterior sampling method in map fitting tasks, matches performance on distance constraint tasks, and exhibits superior engineering robustness across hyperparameters spanning two orders of magnitude. Moreover, its smooth optimization behavior enables a significant reduction in the number of diffusion steps required for inference, leading to better efficiency.

Problem

Research questions and friction points this paper is trying to address.

protein diffusion models

biophysical inverse problems

posterior sampling

experimental constraints

conformation generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding Optimization

Diffusion Models

Protein Structure