Broadening Discovery through Structural Models: Multimodal Combination of Local and Structural Properties for Predicting Chemical Features

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited physical interpretability and generalization capability of SMILES-based representations in chemical modeling, this paper proposes a fingerprint-driven bimodal language–graph joint modeling framework. Methodologically, it introduces the first approach that serializes chemical fingerprints as input sequences to RoBERTa for semantic representation learning, while simultaneously training with graph neural networks—including GIN, GCN, and Graphormer—to jointly model molecular topological structure. This architecture bridges the physical interpretability of fingerprint-based features with the topological reasoning capacity of graph representations, enabling end-to-end prediction of molecular physicochemical properties (e.g., QSAR bioactivity, NMR chemical shifts). Experimental results demonstrate that the model significantly outperforms unimodal baselines across multiple benchmark tasks, achieving superior prediction accuracy and cross-dataset generalization. The framework establishes a novel, interpretable, and scalable paradigm for molecular property prediction.

Technology Category

Application Category

📝 Abstract
In recent years, machine learning has profoundly reshaped the field of chemistry, facilitating significant advancements across various applications, including the prediction of molecular properties and the generation of molecular structures. Language models and graph-based models are extensively utilized within this domain, consistently achieving state-of-the-art results across an array of tasks. However, the prevailing practice of representing chemical compounds in the SMILES format -- used by most datasets and many language models -- presents notable limitations as a training data format. In contrast, chemical fingerprints offer a more physically informed representation of compounds, thereby enhancing their suitability for model training. This study aims to develop a language model that is specifically trained on fingerprints. Furthermore, we introduce a bimodal architecture that integrates this language model with a graph model. Our proposed methodology synthesizes these approaches, utilizing RoBERTa as the language model and employing Graph Isomorphism Networks (GIN), Graph Convolutional Networks (GCN) and Graphormer as graph models. This integration results in a significant improvement in predictive performance compared to conventional strategies for tasks such as Quantitative Structure-Activity Relationship (QSAR) and the prediction of nuclear magnetic resonance (NMR) spectra, among others.
Problem

Research questions and friction points this paper is trying to address.

Predicting chemical features using multimodal models
Overcoming SMILES format limitations in chemistry
Integrating language and graph models for QSAR prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language model trained on chemical fingerprints
Bimodal architecture combining language and graph models
Integration of RoBERTa with GIN, GCN, Graphormer
🔎 Similar Papers
No similar papers found.
Nikolai Rekut
Nikolai Rekut
Moscow Institute of Physics and Technology, Dolgoprudny, Russia; A. N. Nesmeyanov Institute of Organoelement compounds Russian Academy of Sciences, Moscow, Russia
A
Alexey Orlov
Moscow Institute of Physics and Technology, Dolgoprudny, Russia; HSE University, Moscow, Russia
Klea Ziu
Klea Ziu
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
AI4ScienceMachine Learning
E
Elizaveta Starykh
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
M
Martin Takác
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Aleksandr Beznosikov
Aleksandr Beznosikov
PhD, Basic Research of Artificial Intelligence Lab
OptimizationMachine Learning