On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study systematically evaluates the reliability of the AI model Boltz-2 in predicting protein–ligand complex structures and binding affinities for drug discovery. Leveraging a large-scale dataset of real-world drug targets, it presents the first independent validation of Boltz-2’s “co-folding” strategy, benchmarking its performance against conventional molecular docking and physics-based ESMACS free energy calculations. The results indicate that while Boltz-2 shows promise for rapid initial screening, it exhibits significant limitations in structural convergence and binding free energy accuracy, rendering it insufficient for lead compound identification. This work underscores the critical need to integrate physics-based methods in validating AI-driven approaches and establishes an essential empirical benchmark for future AI-enabled drug design efforts.

Technology Category

Application Category

📝 Abstract

Despite continuing hype about the role of AI in drug discovery, no"AI-discovered drugs"have so far received regulatory approval. Here we assess one of the latest AI based tools in this domain. The ability to rapidly predict protein-ligand structures and binding affinities is pivotal for accelerating drug discovery. Boltz-2, a recently developed biomolecular foundation model, aims to bridge the gap between AI efficiency and physics-based precision through a joint"co-folding"approach. In this study, we provide an extensive evaluation of Boltz-2 using two large-scale datasets: 16,780 compounds for 3CLPro and 21,702 compounds for TNKS2. We compare Boltz-2 predicted structures with traditional docking and binding affinities with binding free energies derived from the physics-based ESMACS protocol. Structural analysis reveals significant global RMSD variations, indicating that Boltz-2 predicts multiple protein conformations and ligand binding positions rather than a single converged pose. Energetic evaluations exhibit only weak to moderate correlations across the global datasets. Furthermore, a focused analysis of the top 100 compounds yields no significant correlation between the Boltz-2 predictions and the binding free energies from fine-grained ESMACS, alongside observed saturation difference in ligand structures. Our results show that while Boltz-2 offers substantial speed for initial screening, it lacks the energetic resolution required for lead identification. These findings highlight the necessity of employing physics-based methods for the reliability and refinement of AI-derived models.

Problem

Research questions and friction points this paper is trying to address.

AI reliability

drug discovery

structure prediction

binding affinity

Boltz-2

Innovation

Methods, ideas, or system contributions that make the work stand out.

Boltz-2

binding affinity prediction

protein-ligand co-folding