🤖 AI Summary
In few-shot drug discovery, molecular property prediction (MPP) is severely limited by scarce experimental data. To address this, we propose AdaptMol, an adaptive multimodal fusion framework. AdaptMol introduces a novel two-level attention mechanism that dynamically coordinates SMILES sequence encoding—capturing global semantic patterns—with graph neural network (GNN)-based molecular graph representation—modeling local topological structure. Furthermore, it incorporates an activity-substructure–guided interpretable fusion paradigm, empirically validating the necessity and advantage of complementary bimodal integration. By unifying a SMILES encoder, a GNN, and a prototypical few-shot learning framework, AdaptMol achieves state-of-the-art performance across three benchmark datasets under both 5-shot and 10-shot settings. Experimental results demonstrate substantial improvements in molecular representation quality and prediction accuracy in low-data regimes, establishing a new foundation for interpretable, data-efficient MPP.
📝 Abstract
Accurate molecular property prediction (MPP) is a critical step in modern drug development. However, the scarcity of experimental validation data poses a significant challenge to AI-driven research paradigms. Under few-shot learning scenarios, the quality of molecular representations directly dictates the theoretical upper limit of model performance. We present AdaptMol, a prototypical network integrating Adaptive multimodal fusion for Molecular representation. This framework employs a dual-level attention mechanism to dynamically integrate global and local molecular features derived from two modalities: SMILES sequences and molecular graphs. (1) At the local level, structural features such as atomic interactions and substructures are extracted from molecular graphs, emphasizing fine-grained topological information; (2) At the global level, the SMILES sequence provides a holistic representation of the molecule. To validate the necessity of multimodal adaptive fusion, we propose an interpretable approach based on identifying molecular active substructures to demonstrate that multimodal adaptive fusion can efficiently represent molecules. Extensive experiments on three commonly used benchmarks under 5-shot and 10-shot settings demonstrate that AdaptMol achieves state-of-the-art performance in most cases. The rationale-extracted method guides the fusion of two modalities and highlights the importance of both modalities.