🤖 AI Summary
Existing protein inverse folding methods prioritize sequence recovery accuracy while neglecting energetic stability, resulting in designed sequences with suboptimal conformational or binding energy landscapes. To address this, we propose the first end-to-end framework integrating Markov bridges with direct preference optimization (DPO), incorporating an explicit energy-constrained loss to enable energy-driven generation and quantitative ΔΔG prediction. Our method comprises three core components: structure-aware energy representation learning, energy-guided sequence prior modeling, and DPO-based preference alignment optimization. Evaluated on standard benchmarks, our approach achieves state-of-the-art sequence recovery rates while significantly lowering the binding free energy of designed protein–ligand complexes. Moreover, its ΔΔG prediction accuracy sets a new state-of-the-art, demonstrating robust generalization across diverse target proteins. This work establishes a novel paradigm for high-stability, energy-aware de novo protein design.
📝 Abstract
Designing protein sequences with optimal energetic stability is a key challenge in protein inverse folding, as current deep learning methods are primarily trained by maximizing sequence recovery rates, often neglecting the energy of the generated sequences. This work aims to overcome this limitation by developing a model that directly generates low-energy, stable protein sequences. We propose EnerBridge-DPO, a novel inverse folding framework focused on generating low-energy, high-stability protein sequences. Our core innovation lies in: First, integrating Markov Bridges with Direct Preference Optimization (DPO), where energy-based preferences are used to fine-tune the Markov Bridge model. The Markov Bridge initiates optimization from an information-rich prior sequence, providing DPO with a pool of structurally plausible sequence candidates. Second, an explicit energy constraint loss is introduced, which enhances the energy-driven nature of DPO based on prior sequences, enabling the model to effectively learn energy representations from a wealth of prior knowledge and directly predict sequence energy values, thereby capturing quantitative features of the energy landscape. Our evaluations demonstrate that EnerBridge-DPO can design protein complex sequences with lower energy while maintaining sequence recovery rates comparable to state-of-the-art models, and accurately predicts $Delta Delta G$ values between various sequences.