Bioptic - A Target-Agnostic Potency-Based Small Molecules Search Engine

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

269K/year

🤖 AI Summary

Efficiently retrieving structurally diverse yet biologically similar (i.e., potency-similar) molecules from ultra-large-scale chemical libraries remains a critical challenge in reverse drug discovery. Method: We propose a target-agnostic, potency-driven small-molecule search engine. Our approach introduces a novel potency-oriented molecular representation paradigm—decoupling similarity assessment from target-specific information. It leverages large-model-pretrained potency embeddings and accelerates similarity search via processor-level SIMD instruction optimization. Further, we design a target-free contrastive learning framework to enhance generalization across diverse bioactivity contexts. Results: Evaluated on the 40-billion-molecule Enamine REAL library, our method achieves millisecond-scale latency with 100% recall—significantly outperforming state-of-the-art baselines. To our knowledge, this is the first work enabling real-time, high-fidelity potency-similarity retrieval over an exascale (10¹⁸) molecular space, establishing a scalable, AI-powered paradigm for target-agnostic reverse drug discovery.

Technology Category

Application Category

📝 Abstract

Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.

Problem

Research questions and friction points this paper is trying to address.

Develops target-agnostic model for finding biologically similar molecules

Enables efficient screening of ultra-large chemical libraries

Optimizes speed and recall rate for novel molecule retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Target-agnostic potency-based molecule search

Processor-optimized SIMD fast retrieval

Screens ultra-large libraries efficiently

🔎 Similar Papers

PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching