Deep Learning for Protein-Ligand Docking: Are We There Yet?

📅 2024-05-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the generalization bottleneck of deep learning (DL) methods for protein–ligand docking in realistic scenarios, focusing on three key challenges: (1) pocket-agnostic docking (i.e., without prior binding-site annotation), (2) multi-ligand cooperative docking (e.g., cofactor binding), and (3) docking into predicted apo-protein structures (critical for novel targets). To rigorously evaluate cross-domain generalization under these conditions, we introduce PoseBench—the first comprehensive, application-oriented benchmark for real-world docking—and publicly release it with support for both single- and multi-ligand evaluation. Methodologically, our approach integrates deep structural modeling, physics-informed loss functions, complex-aware clustering during training, and generative structural refinement. Experiments demonstrate that DL-based methods consistently outperform traditional algorithms overall; however, most exhibit limited generalization to multi-ligand settings. Crucially, incorporating physics-guided constraints significantly enhances robustness—particularly for apo-protein docking and unknown-pocket scenarios.

Technology Category

Application Category

📝 Abstract

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to unknown structures); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for unknown pocket generalization). To enable a deeper understanding of docking methods’ real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL methods consistently outperform conventional docking algorithms; (2) most recent DL docking methods fail to generalize to multi-ligand protein targets; and (3) training DL methods with physics-informed loss functions on diverse clusters of protein-ligand complexes is a promising direction for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

Problem

Research questions and friction points this paper is trying to address.

Evaluating deep learning for protein-ligand docking.

Assessing docking with predicted protein structures.

Benchmarking methods for multi-ligand binding scenarios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Learning for Protein-Ligand Docking

PoseBench comprehensive benchmark

Multi-ligand benchmark datasets

🔎 Similar Papers

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches