Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of simultaneously achieving conformational diversity and atomic-level accuracy in generating all-atom structures of dynamic proteins (e.g., GPCRs), this work introduces the first graph-embedding latent diffusion model tailored for all-atom protein structure generation. We propose three novel graph pooling strategies—blind, sequence-aware, and residue-aware—to hierarchically encode structural context, and incorporate a dihedral-angle loss-regularized decoder to ensure system-specific, high-fidelity conformational sampling. The model employs Chebyshev graph neural networks (ChebNet) to explicitly represent all side-chain heavy atoms and learns conformational distributions directly from molecular dynamics (MD) trajectories. Evaluated on the D2R-MD dataset, it achieves an all-atom lDDT of 0.70 and Cα-lDDT of 0.80, with Jensen–Shannon divergence <0.03 for backbone and side-chain dihedral angle distributions—substantially outperforming existing methods. Code and data are publicly released.

Technology Category

Application Category

📝 Abstract
Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research.
Problem

Research questions and friction points this paper is trying to address.

Generates full-atom protein conformations from molecular dynamics data
Addresses lack of atomic detail in protein structure modeling
Improves conformational diversity modeling for dynamic proteins like GPCRs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion for full protein generation
Chebyshev graph neural network embeddings
Dihedral-angle regularized decoder mapping
A
Aditya Sengar
Signal Processing Laboratory (LTS2), EPFL, Lausanne, Switzerland; Institute of Bioengineering, EPFL, Lausanne, Switzerland
Ali Hariri
Ali Hariri
Senior Researcher at Huawei
Access ControlUsage ControlIoT SecurityNetwork SecurityData Spaces
Daniel Probst
Daniel Probst
WUR
cheminformaticschemistrymedical chemistrybioinformaticscomputer science
P
Patrick Barth
Institute of Bioengineering, EPFL, Lausanne, Switzerland; Ludwig Institute for Cancer Research, Lausanne, Switzerland
Pierre Vandergheynst
Pierre Vandergheynst
Professor of Electrical Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)
data sciencemachine learningartificial intelligencenetwork sciencecomputer vision