ToxBench: A Binding Affinity Prediction Benchmark with AB-FEP-Calculated Labels for Human Estrogen Receptor Alpha

📅 2025-07-11
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Protein–ligand binding affinity prediction faces a dual bottleneck: high-accuracy physical methods (e.g., alchemical free-energy perturbation, AB-FEP) incur prohibitive computational cost, while machine learning (ML) models are limited by scarcity of high-quality experimental or computed labels. To address this, we introduce ToxBench—the first large-scale, AB-FEP–derived benchmark dataset specifically for human estrogen receptor α (ERα), comprising 8,770 protein–ligand complexes and employing ligand-wise non-overlapping splits to ensure rigorous generalization evaluation. We further propose DualBind, a novel ML framework integrating structural inputs with physics-informed consistency constraints via a dual-loss objective. On ToxBench, DualBind achieves a root-mean-square error (RMSE) of 1.75 kcal/mol—matching AB-FEP accuracy at negligible computational cost. This work establishes a new paradigm for high-throughput, accurate binding affinity prediction in toxicity assessment and drug discovery.

Technology Category

Application Category

📝 Abstract
Protein-ligand binding affinity prediction is essential for drug discovery and toxicity assessment. While machine learning (ML) promises fast and accurate predictions, its progress is constrained by the availability of reliable data. In contrast, physics-based methods such as absolute binding free energy perturbation (AB-FEP) deliver high accuracy but are computationally prohibitive for high-throughput applications. To bridge this gap, we introduce ToxBench, the first large-scale AB-FEP dataset designed for ML development and focused on a single pharmaceutically critical target, Human Estrogen Receptor Alpha (ER$α$). ToxBench contains 8,770 ER$α$-ligand complex structures with binding free energies computed via AB-FEP with a subset validated against experimental affinities at 1.75 kcal/mol RMSE, along with non-overlapping ligand splits to assess model generalizability. Using ToxBench, we further benchmark state-of-the-art ML methods, and notably, our proposed DualBind model, which employs a dual-loss framework to effectively learn the binding energy function. The benchmark results demonstrate the superior performance of DualBind and the potential of ML to approximate AB-FEP at a fraction of the computational cost.
Problem

Research questions and friction points this paper is trying to address.

Bridging ML and physics-based binding affinity prediction gaps
Providing reliable AB-FEP data for Human ERα target
Assessing ML model generalizability with non-overlapping ligands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale AB-FEP dataset for ML development
DualBind model with dual-loss framework
Benchmarking ML methods for binding affinity
🔎 Similar Papers
No similar papers found.
M
Meng Liu
NVIDIA
K
Karl Leswing
Schrödinger
S
Simon K. S. Chu
NVIDIA
F
Farhad Ramezanghorbani
NVIDIA
G
Griffin Young
Schrödinger
G
Gabriel Marques
Schrödinger
P
Prerna Das
Schrödinger
A
Anjali Panikar
Schrödinger
E
Esther Jamir
Schrödinger
M
Mohammed Sulaiman Shamsudeen
Schrödinger
K
K. Shawn Watts
Schrödinger
A
Ananya Sen
Schrödinger
H
Hari Priya Devannagari
Schrödinger
E
Edward B. Miller
Schrödinger
M
Muyun Lihan
Schrödinger
H
Howook Hwang
Schrödinger
J
Janet Paulsen
NVIDIA
X
Xin Yu
NVIDIA
K
Kyle Gion
NVIDIA
T
Timur Rvachov
NVIDIA
Emine Kucukbenli
Emine Kucukbenli
International School for Advanced Studies (SISSA), Trieste
molecular crystal structure predictionab initio NMRab initio van der WaalsDFT+Hubbardmachine learning with DFT
S
Saee Gopal Paliwal
NVIDIA