π€ AI Summary
Traditional 3D pharmacophore screening suffers from prohibitive computational cost and poor scalability when applied to ultra-large compound libraries.
Method: This work reformulates pharmacophore matching as a neural subgraph matching problem and introduces a contrastive learning framework to efficiently encode queryβtarget relationships in the molecular 3D conformational embedding space. The method jointly encodes geometric and chemical pharmacophore features, enabling end-to-end differentiable matching and zero-shot pre-screening without target-specific training.
Contribution/Results: Evaluated on billion-scale databases, our approach achieves 10β100Γ speedup in pre-screening over conventional tools while maintaining hit rates comparable to state-of-the-art pharmacophore methods. It overcomes the fundamental scalability limitations of classical pharmacophore approaches, establishing a new paradigm for large-scale virtual screening that balances computational efficiency with generalization capability.
π Abstract
The increasing size of screening libraries poses a significant challenge for the development of virtual screening methods for drug discovery, necessitating a re-evaluation of traditional approaches in the era of big data. Although 3D pharmacophore screening remains a prevalent technique, its application to very large datasets is limited by the computational cost associated with matching query pharmacophores to database molecules. In this study, we introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching. Our method reinterprets pharmacophore screening as an approximate subgraph matching problem and enables efficient querying of conformational databases by encoding query-target relationships in the embedding space. We conduct comprehensive investigations of the learned representations and evaluate PharmacoMatch as pre-screening tool in a zero-shot setting. We demonstrate significantly shorter runtimes and comparable performance metrics to existing solutions, providing a promising speed-up for screening very large datasets.