Can AI-predicted complexes teach machine learning to compute drug binding affinity?

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Experimental protein–ligand complex structures are scarce, hindering data-driven drug binding affinity prediction. Method: We propose a novel AI-based data augmentation paradigm: (i) generating synthetic complexes en masse using protein–ligand co-folding models (e.g., AlphaFold-Multimer or RoseTTAFold-All-Atom), and (ii) automatically filtering high-quality predictions via lightweight heuristic rules—based on per-residue pLDDT, interface residue confidence, and geometric plausibility—to substitute experimental structures for training machine learning scoring functions. Contribution/Results: This work is the first to systematically demonstrate that rigorously filtered AI-predicted structures can support high-accuracy affinity modeling. On standard benchmarks (e.g., PDBbind), models trained solely on filtered synthetic data achieve performance on par with—or even surpassing—that of baselines trained on experimental structures (ΔRMSE ≤ 0.2 kcal/mol), markedly reducing reliance on experimentally determined complexes.

Technology Category

Application Category

📝 Abstract

We evaluate the feasibility of using co-folding models for synthetic data augmentation in training machine learning-based scoring functions (MLSFs) for binding affinity prediction. Our results show that performance gains depend critically on the structural quality of augmented data. In light of this, we established simple heuristics for identifying high-quality co-folding predictions without reference structures, enabling them to substitute for experimental structures in MLSF training. Our study informs future data augmentation strategies based on co-folding models.

Problem

Research questions and friction points this paper is trying to address.

Feasibility of AI-predicted complexes for drug affinity prediction

Impact of structural quality on machine learning performance

Heuristics for identifying high-quality co-folding predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using co-folding models for data augmentation

Heuristics for high-quality co-folding predictions

Substituting experimental structures with AI predictions

🔎 Similar Papers

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches