QuantumBench: A Benchmark for Quantum Problem Solving

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing general-purpose LLM benchmarks inadequately assess deep conceptual understanding in specialized scientific domains such as quantum science. Method: We introduce QuantumBench—the first domain-specific, multiple-choice evaluation benchmark for quantum science—comprising ~800 eight-option questions across nine subfields, emphasizing counterintuitive quantum phenomena and domain-specific symbolic notation. The dataset is systematically constructed from peer-reviewed literature and rigorously curated and categorized. We conduct comprehensive evaluations using both leading open- and closed-weight LLMs, including the first empirical analysis of accuracy and robustness under question-format perturbations. Contribution/Results: QuantumBench reveals critical capability gaps and reasoning limitations of current LLMs in quantum scientific reasoning. It establishes a reproducible, extensible, and rigorous evaluation framework to guide the development and refinement of domain-specialized scientific language models.

Technology Category

Application Category

📝 Abstract
Large language models are now integrated into many scientific workflows, accelerating data analysis, hypothesis generation, and design space exploration. In parallel with this growth, there is a growing need to carefully evaluate whether models accurately capture domain-specific knowledge and notation, since general-purpose benchmarks rarely reflect these requirements. This gap is especially clear in quantum science, which features non-intuitive phenomena and requires advanced mathematics. In this study, we introduce QuantumBench, a benchmark for the quantum domain that systematically examine how well LLMs understand and can be applied to this non-intuitive field. Using publicly available materials, we compiled approximately 800 questions with their answers spanning nine areas related to quantum science and organized them into an eight-option multiple-choice dataset. With this benchmark, we evaluate several existing LLMs and analyze their performance in the quantum domain, including sensitivity to changes in question format. QuantumBench is the first LLM evaluation dataset built for the quantum domain, and it is intended to guide the effective use of LLMs in quantum research.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' domain-specific knowledge in quantum science
Assessing model sensitivity to quantum question format variations
Creating first specialized benchmark for quantum computing applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces QuantumBench benchmark for quantum domain
Compiles 800 multiple-choice questions across nine areas
Evaluates LLM performance and format sensitivity
🔎 Similar Papers
No similar papers found.
S
Shunya Minami
National Institute of Advanced Industrial Science and Technology (AIST)
Tatsuya Ishigaki
Tatsuya Ishigaki
National Institute of Advanced Industrial Science and Technology (AIST)
Natural Language ProcessingText GenerationText Summarization
Ikko Hamamura
Ikko Hamamura
NVIDIA
Quantum ComputingQuantum Information
T
Taku Mikuriya
National Institute of Advanced Industrial Science and Technology (AIST); Yokohama National University
Youmi Ma
Youmi Ma
Institute of Science Tokyo
Information ExtractionKnowledge AcquisitionNatural Language ProcessingArtificial Intelligence
Naoaki Okazaki
Naoaki Okazaki
Institute of Science Tokyo
natural language processingartificial intelligencemachine learning
H
Hiroya Takamura
National Institute of Advanced Industrial Science and Technology (AIST)
Y
Yohichi Suzuki
National Institute of Advanced Industrial Science and Technology (AIST)
Tadashi Kadowaki
Tadashi Kadowaki
Unknown affiliation