ToxBench: A Binding Affinity Prediction Benchmark with AB-FEP-Calculated Labels for Human Estrogen Receptor Alpha

📅 2025-07-11
📈 Citations: 0
✹ Influential: 0
📄 PDF

career value

201K/year
đŸ€– AI Summary
Protein–ligand binding affinity prediction faces a dual bottleneck: high-accuracy physical methods (e.g., alchemical free-energy perturbation, AB-FEP) incur prohibitive computational cost, while machine learning (ML) models are limited by scarcity of high-quality experimental or computed labels. To address this, we introduce ToxBench—the first large-scale, AB-FEP–derived benchmark dataset specifically for human estrogen receptor α (ERα), comprising 8,770 protein–ligand complexes and employing ligand-wise non-overlapping splits to ensure rigorous generalization evaluation. We further propose DualBind, a novel ML framework integrating structural inputs with physics-informed consistency constraints via a dual-loss objective. On ToxBench, DualBind achieves a root-mean-square error (RMSE) of 1.75 kcal/mol—matching AB-FEP accuracy at negligible computational cost. This work establishes a new paradigm for high-throughput, accurate binding affinity prediction in toxicity assessment and drug discovery.

Technology Category

Application Category

📝 Abstract
Protein-ligand binding affinity prediction is essential for drug discovery and toxicity assessment. While machine learning (ML) promises fast and accurate predictions, its progress is constrained by the availability of reliable data. In contrast, physics-based methods such as absolute binding free energy perturbation (AB-FEP) deliver high accuracy but are computationally prohibitive for high-throughput applications. To bridge this gap, we introduce ToxBench, the first large-scale AB-FEP dataset designed for ML development and focused on a single pharmaceutically critical target, Human Estrogen Receptor Alpha (ER$α$). ToxBench contains 8,770 ER$α$-ligand complex structures with binding free energies computed via AB-FEP with a subset validated against experimental affinities at 1.75 kcal/mol RMSE, along with non-overlapping ligand splits to assess model generalizability. Using ToxBench, we further benchmark state-of-the-art ML methods, and notably, our proposed DualBind model, which employs a dual-loss framework to effectively learn the binding energy function. The benchmark results demonstrate the superior performance of DualBind and the potential of ML to approximate AB-FEP at a fraction of the computational cost.
Problem

Research questions and friction points this paper is trying to address.

Bridging ML and physics-based binding affinity prediction gaps
Providing reliable AB-FEP data for Human ERα target
Assessing ML model generalizability with non-overlapping ligands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale AB-FEP dataset for ML development
DualBind model with dual-loss framework
Benchmarking ML methods for binding affinity
đŸ’Œ Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
M
Meng Liu
NVIDIA
K
Karl Leswing
Schrödinger
S
Simon K. S. Chu
NVIDIA
F
Farhad Ramezanghorbani
NVIDIA
G
Griffin Young
Schrödinger
G
Gabriel Marques
Schrödinger
P
Prerna Das
Schrödinger
A
Anjali Panikar
Schrödinger
E
Esther Jamir
Schrödinger
M
Mohammed Sulaiman Shamsudeen
Schrödinger
K
K. Shawn Watts
Schrödinger
A
Ananya Sen
Schrödinger
H
Hari Priya Devannagari
Schrödinger
E
Edward B. Miller
Schrödinger
M
Muyun Lihan
Schrödinger
H
Howook Hwang
Schrödinger
J
Janet Paulsen
NVIDIA
X
Xin Yu
NVIDIA
K
Kyle Gion
NVIDIA
T
Timur Rvachov
NVIDIA
Emine Kucukbenli
Emine Kucukbenli
International School for Advanced Studies (SISSA), Trieste
molecular crystal structure predictionab initio NMRab initio van der WaalsDFT+Hubbardmachine learning with DFT
S
Saee Gopal Paliwal
NVIDIA