🤖 AI Summary
To address the insufficient integration of local structural modeling and global biochemical knowledge in molecular representation learning, this paper proposes GODE: a dual-graph neural network framework jointly modeling molecular graphs and biochemical knowledge graphs. GODE is the first to explicitly incorporate multi-source knowledge graph substructures into molecular pretraining. It achieves deep synergy between chemical structure and biological semantics through cross-level contrastive learning, knowledge graph embedding fusion, and molecular graph self-supervised pretraining. Evaluated on 11 molecular property prediction tasks, GODE consistently outperforms state-of-the-art methods—improving average ROC-AUC by 12.7% on classification tasks and reducing average RMSE/MAE by 34.4% on regression tasks. Its core contribution lies in pioneering a molecule–knowledge dual-graph contrastive learning paradigm, establishing a novel, interpretable, and knowledge-enhanced framework for molecular representation learning.
📝 Abstract
Molecular representation learning is vital for various downstream applications, including the analysis and prediction of molecular properties and side effects. While Graph Neural Networks (GNNs) have been a popular framework for modeling molecular data, they often struggle to capture the full complexity of molecular representations. In this paper, we introduce a novel method called GODE, which accounts for the dual-level structure inherent in molecules. Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph. GODE integrates individual molecular graph representations with multi-domain biochemical data from knowledge graphs. By pre-training two GNNs on different graph structures and employing contrastive learning, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures. This fusion yields a more robust and informative representation, enhancing molecular property predictions by leveraging both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model significantly outperforms existing benchmarks, achieving an average ROC-AUC improvement of 12.7% for classification tasks and an average RMSE/MAE improvement of 34.4% for regression tasks. Notably, GODE surpasses the current leading model in property prediction, with advancements of 2.2% in classification and 7.2% in regression tasks.