Bioptic - A Target-Agnostic Potency-Based Small Molecules Search Engine

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficiently retrieving structurally diverse yet biologically similar (i.e., potency-similar) molecules from ultra-large-scale chemical libraries remains a critical challenge in reverse drug discovery. Method: We propose a target-agnostic, potency-driven small-molecule search engine. Our approach introduces a novel potency-oriented molecular representation paradigm—decoupling similarity assessment from target-specific information. It leverages large-model-pretrained potency embeddings and accelerates similarity search via processor-level SIMD instruction optimization. Further, we design a target-free contrastive learning framework to enhance generalization across diverse bioactivity contexts. Results: Evaluated on the 40-billion-molecule Enamine REAL library, our method achieves millisecond-scale latency with 100% recall—significantly outperforming state-of-the-art baselines. To our knowledge, this is the first work enabling real-time, high-fidelity potency-similarity retrieval over an exascale (10¹⁸) molecular space, establishing a scalable, AI-powered paradigm for target-agnostic reverse drug discovery.

Technology Category

Application Category

📝 Abstract
Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.
Problem

Research questions and friction points this paper is trying to address.

Develops target-agnostic model for finding biologically similar molecules
Enables efficient screening of ultra-large chemical libraries
Optimizes speed and recall rate for novel molecule retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Target-agnostic potency-based molecule search
Processor-optimized SIMD fast retrieval
Screens ultra-large libraries efficiently
🔎 Similar Papers
No similar papers found.
V
Vlad Vinogradov
Optic Inc.
I
Ivan Izmailov
Optic Inc.
S
Simon Steshin
Optic Inc.
K
Kong T. Nguyen
Optic Inc.