Distilled Protein Backbone Generation

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion- and flow-based protein backbone generation methods achieve high structural fidelity but require hundreds of iterative steps, resulting in prohibitively slow inference—hindering large-scale de novo protein design. Method: This work introduces score distillation (SiD) to protein generation for the first time, proposing an efficient few-step sampling framework that integrates multi-step generation with inference-time noise modulation to accelerate sampling without compromising teacher-model performance. Contribution/Results: The distilled model generates high-quality backbones in only ~5 steps—over 20× faster than conventional diffusion models—while preserving strong designability, structural diversity, and novelty. This reduction in computational cost significantly advances the practical deployment of diffusion models in real-world protein engineering applications.

Technology Category

Application Category

📝 Abstract
Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for de novo protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the Proteina teacher model. This reduction in inference cost enables large-scale in silico protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications.
Problem

Research questions and friction points this paper is trying to address.

Accelerating slow protein backbone generation from diffusion models
Maintaining designability while reducing sampling steps in generation
Enabling large-scale protein discovery through efficient backbone generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distilled few-step protein backbone generators
Score identity Distillation adapted for proteins
Multistep generation with noise modulation
🔎 Similar Papers
No similar papers found.
L
Liyang Xie
The University of Texas at Austin
H
Haoran Zhang
The University of Texas at Austin
Z
Zhendong Wang
The University of Texas at Austin
Wesley Tansey
Wesley Tansey
Memorial Sloan Kettering Cancer Center
Machine LearningBayesian StatisticsDeep LearningHypothesis TestingComputational Biology
M
Mingyuan Zhou
The University of Texas at Austin