🤖 AI Summary
To address the limited physical interpretability and generalization capability of SMILES-based representations in chemical modeling, this paper proposes a fingerprint-driven bimodal language–graph joint modeling framework. Methodologically, it introduces the first approach that serializes chemical fingerprints as input sequences to RoBERTa for semantic representation learning, while simultaneously training with graph neural networks—including GIN, GCN, and Graphormer—to jointly model molecular topological structure. This architecture bridges the physical interpretability of fingerprint-based features with the topological reasoning capacity of graph representations, enabling end-to-end prediction of molecular physicochemical properties (e.g., QSAR bioactivity, NMR chemical shifts). Experimental results demonstrate that the model significantly outperforms unimodal baselines across multiple benchmark tasks, achieving superior prediction accuracy and cross-dataset generalization. The framework establishes a novel, interpretable, and scalable paradigm for molecular property prediction.
📝 Abstract
In recent years, machine learning has profoundly reshaped the field of chemistry, facilitating significant advancements across various applications, including the prediction of molecular properties and the generation of molecular structures. Language models and graph-based models are extensively utilized within this domain, consistently achieving state-of-the-art results across an array of tasks. However, the prevailing practice of representing chemical compounds in the SMILES format -- used by most datasets and many language models -- presents notable limitations as a training data format. In contrast, chemical fingerprints offer a more physically informed representation of compounds, thereby enhancing their suitability for model training. This study aims to develop a language model that is specifically trained on fingerprints. Furthermore, we introduce a bimodal architecture that integrates this language model with a graph model. Our proposed methodology synthesizes these approaches, utilizing RoBERTa as the language model and employing Graph Isomorphism Networks (GIN), Graph Convolutional Networks (GCN) and Graphormer as graph models. This integration results in a significant improvement in predictive performance compared to conventional strategies for tasks such as Quantitative Structure-Activity Relationship (QSAR) and the prediction of nuclear magnetic resonance (NMR) spectra, among others.