QE-Catalytic: A Graph-Language Multimodal Base Model for Relaxed-Energy Prediction in Catalytic Adsorption

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Current catalytic adsorption configuration energy prediction models (e.g., CatBERTa, GAP-CATBERTa) suffer from limited accuracy and poor configurational discrimination, undermining the reliability of machine learning–driven catalyst screening. To address this, we propose a graph–language深度融合 multimodal foundation model featuring a novel graph–text alignment mechanism that explicitly injects 3D geometric information into the language pathway. By integrating the Qwen large language model with the E(3)-equivariant graph transformer Equiformer-V2, our model jointly encodes atomic-scale 3D structures and structured textual representations. It simultaneously supports high-accuracy adsorption relaxation energy prediction and autoregressive CIF file generation. On the OC20 dataset, our model achieves a mean absolute error (MAE) of 0.486 eV for relaxed adsorption energy prediction—substantially outperforming existing baselines. This work establishes a new paradigm for inverse catalytic design grounded in unified multimodal representation learning.

Technology Category

Application Category

📝 Abstract

Adsorption energy is a key descriptor of catalytic reactivity. It is fundamentally defined as the difference between the relaxed total energy of the adsorbate-surface system and that of an appropriate reference state; therefore, the accuracy of relaxed-energy prediction directly determines the reliability of machine-learning-driven catalyst screening. E(3)-equivariant graph neural networks (GNNs) can natively operate on three-dimensional atomic coordinates under periodic boundary conditions and have demonstrated strong performance on such tasks. In contrast, language-model-based approaches, while enabling human-readable textual descriptions and reducing reliance on explicit graph -- thereby broadening applicability -- remain insufficient in both adsorption-configuration energy prediction accuracy and in distinguishing ``the same system with different configurations,'' even with graph-assisted pretraining in the style of GAP-CATBERTa. To this end, we propose QE-Catalytic, a multimodal framework that deeply couples a large language model ( extbf{Q}wen) with an E(3)-equivariant graph Transformer ( extbf{E}quiformer-V2), enabling unified support for adsorption-configuration property prediction and inverse design on complex catalytic surfaces. During prediction, QE-Catalytic jointly leverages three-dimensional structures and structured configuration text, and injects ``3D geometric information'' into the language channel via graph-text alignment, allowing it to function as a high-performance text-based predictor when precise coordinates are unavailable, while also autoregressively generating CIF files for target-energy-driven structure design and information completion. On OC20, QE-Catalytic reduces the MAE of relaxed adsorption energy from 0.713~eV to 0.486~eV, and consistently outperforms baseline models such as CatBERTa and GAP-CATBERTa across multiple evaluation protocols.

Problem

Research questions and friction points this paper is trying to address.

Predicts relaxed adsorption energy for catalytic reactivity

Improves accuracy over language-only models in adsorption energy

Enables inverse design of catalytic surfaces via text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion of large language model with E(3)-equivariant graph Transformer

Jointly leverages 3D atomic structures and structured configuration text

Enables text-based prediction and autoregressive CIF file generation

🔎 Similar Papers

No similar papers found.