PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of *de novo* designing high-affinity binding proteins for arbitrary protein targets, this paper introduces PPDiff, a non-autoregressive diffusion model that jointly generates binder sequences and 3D conformations in sequence–structure latent space. We propose SSINC, a novel sequence–structure interleaved architecture integrating causal attention, k-nearest-neighbor dynamic graph networks, and global self-attention. We construct PPBench—the first large-scale, general-purpose protein–protein complex benchmark—comprising 706,000 complexes. PPDiff adopts a multi-stage pretraining–fine-tuning paradigm, combining diffusion-based generative modeling with equivariant representation learning. On pretraining reconstruction tasks and downstream applications—including mini-binder design and antigen–antibody co-design—PPDiff achieves success rates of 50.00%, 23.16%, and 16.89%, respectively, substantially outperforming existing methods.

Technology Category

Application Category

📝 Abstract
Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing, remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiffbuilds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, k-nearest neighbor (kNN) equivariant graph layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein-protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBenchand finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiffconsistently surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and 16.89% for the pretraining task and the two downstream applications, respectively.
Problem

Research questions and friction points this paper is trying to address.

Design high-affinity protein binders for arbitrary targets
Jointly model protein sequence and structure non-autoregressively
Overcome wet-lab dependency for protein-protein complex design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion model for sequence-structure joint design
SSINC network with interleaved attention layers
kNN graph layers for 3D local interactions
🔎 Similar Papers
No similar papers found.