Can AI-predicted complexes teach machine learning to compute drug binding affinity?

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Experimental protein–ligand complex structures are scarce, hindering data-driven drug binding affinity prediction. Method: We propose a novel AI-based data augmentation paradigm: (i) generating synthetic complexes en masse using protein–ligand co-folding models (e.g., AlphaFold-Multimer or RoseTTAFold-All-Atom), and (ii) automatically filtering high-quality predictions via lightweight heuristic rules—based on per-residue pLDDT, interface residue confidence, and geometric plausibility—to substitute experimental structures for training machine learning scoring functions. Contribution/Results: This work is the first to systematically demonstrate that rigorously filtered AI-predicted structures can support high-accuracy affinity modeling. On standard benchmarks (e.g., PDBbind), models trained solely on filtered synthetic data achieve performance on par with—or even surpassing—that of baselines trained on experimental structures (ΔRMSE ≤ 0.2 kcal/mol), markedly reducing reliance on experimentally determined complexes.

Technology Category

Application Category

📝 Abstract
We evaluate the feasibility of using co-folding models for synthetic data augmentation in training machine learning-based scoring functions (MLSFs) for binding affinity prediction. Our results show that performance gains depend critically on the structural quality of augmented data. In light of this, we established simple heuristics for identifying high-quality co-folding predictions without reference structures, enabling them to substitute for experimental structures in MLSF training. Our study informs future data augmentation strategies based on co-folding models.
Problem

Research questions and friction points this paper is trying to address.

Feasibility of AI-predicted complexes for drug affinity prediction
Impact of structural quality on machine learning performance
Heuristics for identifying high-quality co-folding predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using co-folding models for data augmentation
Heuristics for high-quality co-folding predictions
Substituting experimental structures with AI predictions
🔎 Similar Papers
No similar papers found.
W
Wei-Tse Hsu
Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK
S
Savva Grevtsev
Department of Chemistry, University of Oxford, Mansfield Road, Oxford, OX1 3TA, UK
T
Thomas Douglas
Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK
A
Aniket Magarkar
Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397 Biberach an der Riß, Germany
Philip C. Biggin
Philip C. Biggin
Professor of Computational Biochemistry, University of Oxford
Structural bioinformaticscomputational chemistrydrug-ligand interactionsionotropic glutamate receptorscys-loop receptors