L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Metal–organic frameworks (MOFs) exhibit intricate crystal structures and domain-specific implicit knowledge, posing significant challenges for conventional large language models (LLMs) to achieve effective representation learning. Method: We propose the first MOF-oriented multimodal large language model, introducing a novel “structure→semantic token” compression mechanism that jointly encodes crystal structures (extracted via a pretrained crystal encoder), textual descriptions, and domain knowledge. A lightweight projection layer and multimodal alignment training ensure efficient, unified representation learning. Contribution/Results: Extensive experiments demonstrate that our model substantially outperforms state-of-the-art closed-source LLMs (e.g., GPT-5) on MOF property prediction and scientific knowledge generation tasks—achieving higher accuracy with significantly fewer parameters. This work overcomes the cognitive limitations of text-only LLMs in materials science and establishes a scalable, multimodal paradigm for intelligent functional material design.

Technology Category

Application Category

📝 Abstract
Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, because understanding complex physical phenomena demands multifaceted representations far beyond language alone. A compelling example is the design of functional materials such as MOFs-critical for a range of impactful applications like carbon capture and hydrogen storage. Navigating their vast and intricate design space in language-based representations interpretable by LLMs is challenging due to the numerous possible three-dimensional atomic arrangements and strict reticular rules of coordination geometry and topology. Despite promising early results in LLM-assisted discovery for simpler materials systems, MOF design remains heavily reliant on tacit human expertise rarely codified in textual information alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM for MOFs. L2M3OF integrates crystal representation learning with language understanding to process structural, textual, and knowledge modalities jointly. L2M3OF employs a pre-trained crystal encoder with a lightweight projection layer to compress structural information into a token space, enabling efficient alignment with language instructions. To facilitate training and evaluation, we curate a structure-property-knowledge database of crystalline materials and benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5, Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms leading text-based closed-source LLMs in property prediction and knowledge generation tasks, despite using far fewer parameters. These results highlight the importance of multimodal approaches for porous material understanding and establish L2M3OF as a foundation for next-generation AI systems in materials discovery.
Problem

Research questions and friction points this paper is trying to address.

Designing functional metal-organic frameworks with complex 3D atomic arrangements
Overcoming reliance on human expertise not captured in text alone
Integrating structural, textual, and knowledge data for material discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates crystal representation learning with language understanding
Compresses structural information into token space via projection
Multimodal approach for porous material understanding
🔎 Similar Papers
No similar papers found.
J
Jiyu Cui
Department of Chemistry, University of Liverpool
F
Fang Wu
Department of Computer Science, University of Stanford
Haokai Zhao
Haokai Zhao
University of New South Wales
Deep Learning
M
Minggao Feng
Department of Chemistry, University of Liverpool
X
Xenophon Evangelopoulos
Department of Chemistry, University of Liverpool
Andrew I. Cooper
Andrew I. Cooper
University of Liverpool
Materials Chemistry
Yejin Choi
Yejin Choi
Stanford University / NVIDIA
Natural Language ProcessingDeep LearningArtificial IntelligenceCommonsense Reasoning