🤖 AI Summary
Standard density functional theory (DFT) often erroneously predicts strongly correlated or complex materials as metallic, yielding band gaps that disagree with experimental observations. To address this, we propose XDFT—a closed-loop, self-evolving agent that enables interpretable attribution of DFT–experiment band gap mismatches through hypothesis generation, first-principles validation, and Bayesian posterior updating. This approach achieves the first large-scale automated diagnosis by integrating Bayesian inference, hypothesis-driven testing, and large language model (LLM)-based ranking, while distilling material-specific insights into four concise static rules. Evaluated on a benchmark of 124 materials, XDFT successfully identifies 70 out of 90 mismatch cases (78%), substantially outperforming random baselines (19%) and static LLM ranking (20%), and provides auditable justifications for its failures.
📝 Abstract
Standard density functional theory (DFT) routinely misclassifies the electronic ground state of correlated and structurally complex compounds, predicting metallic behaviour for materials that experiments report as semiconductors. Each such mismatch encodes a specific non-ideality -- magnetic ordering, electron correlation, an alternative polymorph, or a defect -- that the calculation excluded, but extracting that signal at scale has remained a manual exercise. Here we introduce XDFT, a closed-loop agent that diagnoses the mismatch automatically: it draws candidate hypotheses from a curated catalogue, executes the corresponding first-principles tests, and updates a global Bayesian posterior over hypothesis usefulness from each verdict. On a verified benchmark of 124 materials, XDFT identifies a resolving mechanism for 70 of 90 mismatch cases (78\%), an order of magnitude above a uniform-random baseline (19\%) and a static LLM ordering (20\%). The internal posterior aligns with empirical performance over the benchmark timeline, and resolved cases collapse into a tri-partite element-class taxonomy that we distil into a four-line static rule. Each diagnosed material is returned with a corrected protocol and a mechanistic attribution; failed cases are flagged as evidence-backed targets for experimental re-examination.