🤖 AI Summary
This work addresses the long-standing activity–stability trade-off in covalent organic frameworks (COFs) for photocatalytic hydrogen evolution, primarily caused by the hydrolytic lability of imine linkages. To overcome this challenge, we propose an inverse design strategy leveraging Ara, a large language model (LLM)-based agent endowed with chemical reasoning capabilities. Ara integrates chemical priors and multi-objective screening criteria—including GFN1-xTB fragment-based computations, donor–acceptor theory, conjugation effects, and hierarchical bond stability assessments—to enable efficient and interpretable COF structure discovery. Compared to random search and Bayesian optimization, Ara achieves an 11.5-fold increase in hit rate (reaching 52.7%) and identifies its first high-performing candidate as early as the 12th iteration. This study represents the first application of a chemically reasoning LLM agent to multi-objective inverse design of COFs, successfully yielding materials that concurrently exhibit high photocatalytic activity and hydrolytic stability.
📝 Abstract
Covalent organic frameworks (COFs) are promising photocatalysts for solar hydrogen production, yet the most electronically favorable linkages, imines, hydrolyze rapidly in water, creating a stability--activity trade-off that limits practical deployment. Navigating the combinatorial design space of nodes, linkers, linkages, and functional groups to identify candidates that are simultaneously active and durable remains a formidable challenge. Here we introduce Ara, a large-language-model (LLM) agent that leverages pretrained chemical knowledge, donor--acceptor theory, conjugation effects, and linkage stability hierarchies, to guide the search for photocatalytic COFs satisfying joint band-gap, band-edge, and hydrolytic-stability criteria. Evaluated against random search and Bayesian optimization (BO) over a space consisting of candidates with various nodes, linkers, linkages, and r-groups, screened with a GFN1-xTB fragment pipeline, Ara achieves a 52.7\% hit rate (11.5$\times$ random, p = 0.006), finds its first hit at iteration 12 versus 25 for random search, and significantly outperforms BO (p = 0.006). Inspection of the agent's reasoning traces reveals interpretable chemical logic: early convergence on vinylene and beta-ketoenamine linkages for stability, node selection informed by electron-withdrawing character, and systematic R-group optimization to center the band gap at 2.0 eV. Exhaustive evaluation of the full search space uncovers a complementary exploitation--exploration trade-off between the agent and BO, suggesting that hybrid strategies may combine the strengths of both approaches. These results demonstrate that LLM chemical priors can substantially accelerate multi-criteria materials discovery.