ChemDFM-X: Towards Large Multimodal Model for Chemistry

📅 2024-09-20
🏛️ Science China Information Sciences
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing models struggle to unify heterogeneous chemical data—such as molecular structures, spectroscopic profiles, textual descriptions, and reaction equations—while failing to fully harness the capabilities of large language models. Method: We introduce the first large-scale multimodal foundation model tailored for chemistry. It enables end-to-end joint modeling of SMILES, InChI, molecular graphs, IR/NMR spectra, and natural language. Our approach incorporates a chemistry-aware cross-modal alignment mechanism and a domain-adaptive pretraining paradigm, integrating a GNN-based molecular encoder, a CNN-based spectral branch, and multimodal adapters. Pretraining employs contrastive learning and masked modality reconstruction. Contribution/Results: The model achieves an average 9.3% improvement across 12 downstream tasks versus unimodal baselines, supports zero-shot cross-modal inference, and attains a 68.7% Top-1 accuracy on the USPTO-50K retrosynthetic prediction benchmark.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Chemical Domain
Multimodal Models
Cross-modal General Intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal Chemical Dialogue Model
ChemDFM-X
Universal Chemical Intelligence
🔎 Similar Papers
No similar papers found.
Zihan Zhao
Zihan Zhao
Shanghai Jiao Tong University
NLP
B
Bo Chen
Suzhou Laboratory, Suzhou 215123, China
J
Jingpiao Li
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China; Suzhou Laboratory, Suzhou 215123, China
L
Lu Chen
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China; Suzhou Laboratory, Suzhou 215123, China
L
Liyang Wen
Suzhou Laboratory, Suzhou 215123, China
P
Pengyu Wang
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China; Suzhou Laboratory, Suzhou 215123, China
Zichen Zhu
Zichen Zhu
Shanghai Jiao Tong University
GUI智能体,多模态大模型,人机交互
D
Danyang Zhang
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
Z
Ziping Wan
Suzhou Laboratory, Suzhou 215123, China
Yansi Li
Yansi Li
Shanghai Jiao Tong University
Large Language ModelsReasoningGUI Agents
Z
Zhongyang Dai
Suzhou Laboratory, Suzhou 215123, China
X
Xin Chen
Suzhou Laboratory, Suzhou 215123, China
K
Kai Yu
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China; Suzhou Laboratory, Suzhou 215123, China