🤖 AI Summary
Current AI models struggle to simultaneously predict protein–ligand binding conformations and affinities with high accuracy, particularly in multi-ligand flexible docking and end-to-end generation from apo protein structures to holo complexes—key bottlenecks in structure-based drug design. To address this, we propose the first SE(3)-equivariant generative model based on conditional geometric flow matching, unifying conformational sampling and affinity prediction. Our method enables concurrent flexible docking of an arbitrary number of ligands, blind docking without multiple sequence alignments (MSA), and joint output of structural confidence scores and binding affinity estimates. On the PoseBusters blind docking benchmark, it achieves a 51% success rate—surpassing AlphaFold 3. Its docking performance matches Chai-1 on DockGen-E, and it ranks among the top five methods in CASP16 affinity prediction across 140 protein–ligand complexes.
📝 Abstract
Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.