Protein Autoregressive Modeling via Multiscale Structure Generation

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first multi-scale autoregressive framework for protein backbone generation, addressing the degradation in generation quality caused by the lack of multi-scale modeling and exposure bias inherent in autoregressive approaches. The method employs a coarse-to-fine hierarchical generation strategy that mimics the protein folding process, integrating multi-scale downsampling, an autoregressive Transformer, and a streaming backbone decoder. To mitigate exposure bias, it incorporates noisy context learning and scheduled sampling. The model achieves high-quality unconditional backbone synthesis, demonstrating strong scaling behavior, exceptional zero-shot generalization, and flexible support for human-provided prompts, enabling effective zero-shot conditional generation and motif-scaffold design.

Technology Category

Application Category

📝 Abstract
We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.
Problem

Research questions and friction points this paper is trying to address.

protein structure generation
autoregressive modeling
multi-scale generation
exposure bias
backbone generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

protein autoregressive modeling
multiscale structure generation
exposure bias mitigation
zero-shot generalization
flow-based backbone decoder
🔎 Similar Papers
No similar papers found.