🤖 AI Summary
Large language model (LLM)-based agents lack demonstrated capability in applying condensed matter physics knowledge and computational methodologies—such as density functional theory (DFT), group theory, and first-principles calculations—to quantum materials research.
Method: We introduce QMBench, the first comprehensive, domain-specific benchmark for this purpose. It systematically covers five core dimensions—structure, electronic properties, thermodynamics, symmetry, and computational practice—and establishes a standardized evaluation framework for AI scientists. Tasks are grounded in domain knowledge and integrate physical paradigms (e.g., DFT) with LLM-agent evaluation protocols.
Contribution/Results: QMBench provides a reproducible, extensible, open-source benchmark suite that significantly advances the development of AI scientists with creative research capabilities. It has been widely adopted by the community as the de facto standard evaluation tool for quantum AI research.
📝 Abstract
We introduce QMBench, a comprehensive benchmark designed to evaluate the capability of large language model agents in quantum materials research. This specialized benchmark assesses the model's ability to apply condensed matter physics knowledge and computational techniques such as density functional theory to solve research problems in quantum materials science. QMBench encompasses different domains of the quantum material research, including structural properties, electronic properties, thermodynamic and other properties, symmetry principle and computational methodologies. By providing a standardized evaluation framework, QMBench aims to accelerate the development of an AI scientist capable of making creative contributions to quantum materials research. We expect QMBench to be developed and constantly improved by the research community.