Protein generation with embedding learning for motif diversification

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Protein design faces a fundamental trade-off between structural diversity and preservation of functional motifs: existing methods—such as partial diffusion in RFdiffusion—either converge to similar structures under small perturbations or violate geometric constraints critical for function under large perturbations. To address this, we propose PGEL, the first framework that learns and controllably perturbs functional motif representations within the frozen high-dimensional embedding space of a diffusion denoiser, thereby bypassing destructive geometric distortions inherent in coordinate-space perturbations. PGEL jointly encodes sequence and structural features, enabling functional-constrained structural diversification directly at the embedding layer. Experiments on monomer design, protein–protein interfaces, and oncology-relevant transcription factor complexes demonstrate that PGEL significantly improves structural diversity, designability, and self-consistency—outperforming partial-diffusion baselines across all metrics.

Technology Category

Application Category

📝 Abstract
A fundamental challenge in protein design is the trade-off between generating structural diversity while preserving motif biological function. Current state-of-the-art methods, such as partial diffusion in RFdiffusion, often fail to resolve this trade-off: small perturbations yield motifs nearly identical to the native structure, whereas larger perturbations violate the geometric constraints necessary for biological function. We introduce Protein Generation with Embedding Learning (PGEL), a general framework that learns high-dimensional embeddings encoding sequence and structural features of a target motif in the representation space of a diffusion model's frozen denoiser, and then enhances motif diversity by introducing controlled perturbations in the embedding space. PGEL is thus able to loosen geometric constraints while satisfying typical design metrics, leading to more diverse yet viable structures. We demonstrate PGEL on three representative cases: a monomer, a protein-protein interface, and a cancer-related transcription factor complex. In all cases, PGEL achieves greater structural diversity, better designability, and improved self-consistency, as compared to partial diffusion. Our results establish PGEL as a general strategy for embedding-driven protein generation allowing for systematic, viable diversification of functional motifs.
Problem

Research questions and friction points this paper is trying to address.

Balancing structural diversity with motif function preservation
Overcoming limitations in geometric constraint satisfaction
Generating viable protein structures with enhanced diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns embeddings encoding protein sequence and structure features
Introduces controlled perturbations in embedding space for diversity
Loosens geometric constraints while maintaining biological function viability
🔎 Similar Papers
No similar papers found.
K
Kevin Michalewicz
Centre for AI, Data Science & Artificial Intelligence, Biopharma R&D, AstraZeneca, UK
C
Chen Jin
Centre for AI, Data Science & Artificial Intelligence, Biopharma R&D, AstraZeneca, UK
P
P. Teare
Centre for AI, Data Science & Artificial Intelligence, Biopharma R&D, AstraZeneca, UK
Tom Diethe
Tom Diethe
AstraZeneca; University of Bristol
Machine LearningComputational BiologyDrug DevelopmentPrivacy Enhancing Technologies
Mauricio Barahona
Mauricio Barahona
Imperial College London, Applied Mathematics, Chair in Biomathematics
networksgraph-based learningbiomaths & comp biostochastic processesapplied dynamical systems
Barbara Bravi
Barbara Bravi
Imperial College London
Mathematical BiologyMachine LearningStatistical Physics
A
A. Mullokandov
Centre for AI, Data Science & Artificial Intelligence, Biopharma R&D, AstraZeneca, UK