🤖 AI Summary
The vast structural space of metal–organic frameworks (MOFs) and the prohibitive computational cost of density functional theory (DFT) and molecular simulations hinder scalable inverse MOF design. Method: This work introduces the first chemistry-aware, string-based generative framework—MOFid—integrating a GPT-based language model, the MOFormer property predictor, and proximal policy optimization (PPO) reinforcement learning to enable end-to-end, property-guided, synthesizable MOF de novo design. Contribution/Results: We propose MOFid, a novel string encoding that unifies topological and connectivity information; this is the first systematic application of RL-enhanced generative language models to MOF inverse design. Experiments show >92% topological validity, high synthetic feasibility, and superior performance over random sampling and VAE baselines on target properties—including gas adsorption and electrical conductivity. A single inference yields hundreds of high-quality candidate structures.
📝 Abstract
The discovery of Metal-Organic Frameworks (MOFs) with application-specific properties remains a central challenge in materials chemistry, owing to the immense size and complexity of their structural design space. Conventional computational screening techniques such as molecular simulations and density functional theory (DFT), while accurate, are computationally prohibitive at scale. Machine learning offers an exciting alternative by leveraging data-driven approaches to accelerate materials discovery. The complexity of MOFs, with their extended periodic structures and diverse topologies, creates both opportunities and challenges for generative modeling approaches. To address these challenges, we present a reinforcement learning-enhanced, transformer-based framework for the de novo design of MOFs. Central to our approach is MOFid, a chemically-informed string representation encoding both connectivity and topology, enabling scalable generative modeling. Our pipeline comprises three components: (1) a generative GPT model trained on MOFid sequences, (2) MOFormer, a transformer-based property predictor, and (3) a reinforcement learning (RL) module that optimizes generated candidates via property-guided reward functions. By integrating property feedback into sequence generation, our method drives the model toward synthesizable, topologically valid MOFs with desired functional attributes. This work demonstrates the potential of large language models, when coupled with reinforcement learning, to accelerate inverse design in reticular chemistry and unlock new frontiers in computational MOF discovery.