🤖 AI Summary
Existing diffusion models for protein structure generation often produce physically implausible conformations due to physically ungrounded noise schedules.
Method: We propose a physics-guided nonlinear noising scheme that embeds classical mechanical constraints—such as bond lengths, bond angles, and secondary-structure continuity—into an SE(3)-equivariant flow-matching framework, enabling the first physically driven, topology-preserving protein unfolding and generation. Our method jointly encodes sequence information and backbone geometry, supporting high-fidelity, SE(3)-equivariant 3D conformation generation conditioned on amino acid sequences.
Results: In unconditional generation, our approach achieves state-of-the-art performance, significantly improving structural diversity, physical validity, and designability. Moreover, it accurately folds monomeric sequences into native-like conformations. This establishes a new paradigm for programmable protein design grounded in physical principles.
📝 Abstract
Protein structure prediction and folding are fundamental to understanding biology, with recent deep learning advances reshaping the field. Diffusion-based generative models have revolutionized protein design, enabling the creation of novel proteins. However, these methods often neglect the intrinsic physical realism of proteins, driven by noising dynamics that lack grounding in physical principles. To address this, we first introduce a physically motivated non-linear noising process, grounded in classical physics, that unfolds proteins into secondary structures (e.g., alpha helices, linear beta sheets) while preserving topological integrity--maintaining bonds, and preventing collisions. We then integrate this process with the flow-matching paradigm on SE(3) to model the invariant distribution of protein backbones with high fidelity, incorporating sequence information to enable sequence-conditioned folding and expand the generative capabilities of our model. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in unconditional protein generation, producing more designable and novel protein structures while accurately folding monomer sequences into precise protein conformations.