🤖 AI Summary
Protein–ligand binding affinity prediction remains hindered by limited generalizability, poor interpretability, and insufficient robustness under low-data regimes. Method: This study conducts a systematic review and empirical evaluation of affinity prediction approaches—including physics-based models, traditional machine learning (e.g., RF, XGBoost), and deep learning architectures (e.g., GCN, SE(3)-Transformer, GNNs)—integrating molecular docking, 3D conformation generation, sequence/structure embeddings, and energy-based features. Contribution/Results: We present the first comprehensive analysis of methodological evolution, identify three critical bottlenecks, and propose a novel paradigm combining multi-scale representation fusion with physics-informed learning. We establish data quality, negative sample construction, and cross-target transfer as key determinants of performance. Our framework achieves R² = 0.82 on PDBbind v2020 and delivers a principled, AI-driven methodology for drug discovery.
📝 Abstract
Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to a target protein. The binding affinity, which refers to the strength of this interaction, is central to many important problems in bioinformatics such as drug design. An extensive amount of work has been devoted to predicting binding affinity over the past decades due to its significance. In this paper, we review all significant recent works, focusing on the methods, features, and benchmark datasets. We have observed a rising trend in the use of traditional machine learning and deep learning models for predicting binding affinity, accompanied by an increasing amount of data on proteins and small drug-like molecules. While prediction results are constantly improving, we also identify several open questions and potential directions that remain unexplored in the field. This paper could serve as an excellent starting point for machine learning researchers who wish to engage in the study of binding affinity, or for anyone with general interests in machine learning, drug discovery, and bioinformatics.