🤖 AI Summary
In molecular activity prediction for drug discovery, conventional models suffer from low accuracy and poor generalization under small-sample and noisy-data conditions. To address this, we propose the Similarity-Quantified Relative Learning (SQRL) paradigm: it reformulates the absolute regression task as relative difference modeling over structurally similar molecular pairs, leveraging precomputed molecular similarities (e.g., ECFP/Tanimoto) to guide graph neural network (GNN) training. SQRL overcomes the limitation of single-molecule independent modeling, substantially enhancing model robustness and data efficiency. Evaluated on multiple public and industrial proprietary datasets, SQRL achieves an average 18.7% reduction in mean absolute error (MAE) and improves AUC by over 0.12 in small-sample tasks (<500 compounds). These results demonstrate SQRL’s superior accuracy, strong generalization capability, and practical utility in real-world drug discovery scenarios.
📝 Abstract
Accurate prediction of molecular activities is crucial for efficient drug discovery, yet remains challenging due to limited and noisy datasets. We introduce Similarity-Quantized Relative Learning (SQRL), a learning framework that reformulates molecular activity prediction as relative difference learning between structurally similar pairs of compounds. SQRL uses precomputed molecular similarities to enhance training of graph neural networks and other architectures, and significantly improves accuracy and generalization in low-data regimes common in drug discovery. We demonstrate its broad applicability and real-world potential through benchmarking on public datasets as well as proprietary industry data. Our findings demonstrate that leveraging similarity-aware relative differences provides an effective paradigm for molecular activity prediction.