CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Traditional catalytic materials research decouples property prediction from inverse design, leading to inconsistent representation spaces, data distribution shifts, and unstable closed-loop optimization. This work proposes QE-Catalytic-V2, a unified graph–text multimodal large language model that jointly models three-dimensional crystal structures (in CIF format) and textual information within a single framework for the first time. By enabling shared representations and co-training for both property prediction and structure generation, the method supports a stable closed-loop optimization pipeline encompassing inverse design, prediction, screening, and iterative redesign. Evaluated on relaxation energy prediction and inverse design tasks, QE-Catalytic-V2 consistently outperforms decoupled baselines, demonstrating the effectiveness and superiority of the unified modeling paradigm.

📝 Abstract

Property prediction and inverse structural design of catalytic materials are typically modeled as two independent tasks: the former predicts target properties from given structures, whereas the latter generates candidate structures according to desired properties. Although the decoupled paradigm facilitates the implementation of a ``generation--evaluation--screening'' workflow, the inconsistency between the generative model and the property prediction model in terms of representation spaces and training objectives can readily introduce data distribution shifts and evaluator bias, thereby limiting the stability of closed-loop optimization. In this work, we propose QE-Catalytic-V2, a unified graph--text multimodal large language model for catalytic materials, which integrates property prediction and inverse design within the same model and shared representation space. Under this unified framework, QE-Catalytic-V2 can not only perform reliable property prediction by leveraging three-dimensional structures and textual information, but also generate and screen physically feasible CIF candidates conditioned on target properties, thereby forming a closed-loop optimization workflow of ``inverse design--prediction--screening--redesign.'' Experimental results demonstrate that this unified paradigm outperforms decoupled baselines on both catalytic relaxed-energy prediction and inverse design tasks, validating the effectiveness of jointly modeling property prediction and structure generation within a single multimodal model.

Problem

Research questions and friction points this paper is trying to address.

catalytic materials

property prediction

inverse design

multimodal learning

closed-loop optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language model

unified framework

inverse design