๐ค AI Summary
Antibiotic resistance poses a global health crisis, yet conventional drug discovery remains inefficient and costly, and existing machine learning models fail to fully exploit the representational power of multimodal molecular data. To address this, we propose the first contrastive learningโbased multimodal molecular foundation model, integrating SMILES strings, molecular graphs, and Morgan fingerprints. Our method introduces a biologically informed hierarchical routing attention mechanism and a unified tri-modal encoder, and is the first to apply unsupervised multimodal contrastive learning specifically for antibiotic-specific representation learning. The model is pre-trained on ChEMBL (1.6 million molecules) using a Transformer architecture with Rotary Position Embedding (RoPE) for sequence/graph encoding and an MLP-based fingerprint encoder. Experiments demonstrate significant improvements over state-of-the-art methods in antibiotic property prediction and achieve domain-leading performance across multiple downstream antibiotic screening tasks.
๐ Abstract
Due to the rise in antimicrobial resistance, identifying novel compounds with antibiotic potential is crucial for combatting this global health issue. However, traditional drug development methods are costly and inefficient. Recognizing the pressing need for more effective solutions, researchers have turned to machine learning techniques to streamline the prediction and development of novel antibiotic compounds. While foundation models have shown promise in antibiotic discovery, current mainstream efforts still fall short of fully leveraging the potential of multimodal molecular data. Recent studies suggest that contrastive learning frameworks utilizing multimodal data exhibit excellent performance in representation learning across various domains. Building upon this, we introduce CL-MFAP, an unsupervised contrastive learning (CL)-based multimodal foundation (MF) model specifically tailored for discovering small molecules with potential antibiotic properties (AP) using three types of molecular data. This model employs 1.6 million bioactive molecules with drug-like properties from the ChEMBL dataset to jointly pretrain three encoders: (1) a transformer-based encoder with rotary position embedding for processing SMILES strings; (2) another transformer-based encoder, incorporating a novel bi-level routing attention mechanism to handle molecular graph representations; and (3) a Morgan fingerprint encoder using a multilayer perceptron, to achieve the contrastive learning purpose. The CL-MFAP outperforms baseline models in antibiotic property prediction by effectively utilizing different molecular modalities and demonstrates superior domain-specific performance when fine-tuned for antibiotic-related property prediction tasks.