Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the disconnect between generative modeling and sequence optimization in structure-based de novo protein binder design by introducing Proteina-Complexa, a method that unifies conditional generation and “hallucination”-based optimization for the first time. Built upon an all-atom representation, Proteina-Complexa employs a flow-based latent space generative architecture, integrating generative pretraining with inference-time optimization strategies. The authors also construct Teddymer, a large-scale synthetic dataset of binding pairs, to support model training and evaluation. On computational benchmarks, Proteina-Complexa substantially outperforms existing approaches, demonstrating significantly higher success rates in vitro. Furthermore, the framework successfully generalizes to small-molecule target binding and enzyme design tasks, highlighting its versatility and robustness in diverse protein design scenarios.

Technology Category

Application Category

📝 Abstract

Protein interaction modeling is central to protein design, which has been transformed by machine learning with applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Proteina-Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architectures and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Proteina-Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We also demonstrate interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

Problem

Research questions and friction points this paper is trying to address.

protein binder design

de novo protein design

atomistic modeling

protein-protein interactions

computational protein design

Innovation

Methods, ideas, or system contributions that make the work stand out.

atomistic binder design

generative pretraining

test-time optimization