π€ AI Summary
Small-molecule colloidal aggregate-mediated false positives remain a persistent challenge in high-throughput screening (HTS), impeding reliable hit identification.
Method: This study introduces MEGAN, an interpretable AI framework that models molecular aggregation propensity using graph neural networks (GNNs), employs SHAP for explainable feature attribution, and pioneers a molecule-level counterfactual generation algorithm enabling atomic- or functional-groupβscale structural modifications to mitigate aggregation.
Contribution/Results: MEGAN transcends conventional medicinal chemistry intuition by accurately identifying non-intuitive aggregation behavior and delivering actionable, structure-based optimization strategies. Experimental validation via UV-Vis spectroscopy and dynamic light scattering (DLS) confirms both predictive accuracy and efficacy of designed modifications. The approach significantly improves false-positive detection rates and accelerates the identification of high-quality lead compounds.
π Abstract
Herein, we present the application of MEGAN, our explainable AI (xAI) model, for the identification of small colloidally aggregating molecules (SCAMs). This work offers solutions to the long-standing problem of false positives caused by SCAMs in high throughput screening for drug discovery and demonstrates the power of xAI in the classification of molecular properties that are not chemically intuitive based on our current understanding. We leverage xAI insights and molecular counterfactuals to design alternatives to problematic compounds in drug screening libraries. Additionally, we experimentally validate the MEGAN prediction classification for one of the counterfactuals and demonstrate the utility of counterfactuals for altering the aggregation properties of a compound through minor structural modifications. The integration of this method in high-throughput screening approaches will help combat and circumvent false positives, providing better lead molecules more rapidly and thus accelerating drug discovery cycles.