QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work addresses the lack of systematic evaluation of existing vision-language models in understanding quantum calibration diagrams. We present the first multimodal benchmark specifically designed for quantum calibration tasks, encompassing diverse scenarios and question types across both superconducting and neutral-atom quantum computing platforms, thereby establishing a standardized evaluation framework. Through comprehensive assessments using zero-shot inference, in-context learning, and supervised fine-tuning, we systematically evaluate state-of-the-art models and uncover significant performance gaps in multi-image contextual reasoning. The best-performing general-purpose zero-shot model achieves an average score of 72.3, while our newly released open-source model, NVIDIA Ising Calibration 1, attains 74.7 under zero-shot settings.
📝 Abstract
Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum calibration plots: 243 samples across 87 scenario types from 22 experiment families, spanning superconducting qubits and neutral atoms, evaluated on six question types in both zero-shot and in-context learning settings. The best general-purpose zero-shot model reaches a mean score of 72.3, and many open-weight models degrade under multi-image in-context learning, whereas frontier closed models improve substantially. A supervised fine-tuning ablation at the 9-billion-parameter scale shows that SFT improves zero-shot performance but cannot close the multimodal in-context learning gap. As a reference case study, we release NVIDIA Ising Calibration 1, an open-weight model based on Qwen3.5-35B-A3B that reaches 74.7 zero-shot average score.
Problem

Research questions and friction points this paper is trying to address.

quantum calibration
calibration plots
vision-language models
benchmarking
multimodal understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models
quantum calibration
benchmark
in-context learning
multimodal evaluation
🔎 Similar Papers
No similar papers found.
Shuxiang Cao
Shuxiang Cao
NVIDIA Corporation
Quantum computingSuperconducting circuitsAI for Science
Z
Zijian Zhang
NVIDIA, University of Toronto, Vector Institute for Artificial Intelligence
Abhishek Agarwal
Abhishek Agarwal
Higher Scientist, National Physical Laboratory
Quantum computingquantum algorithms
G
Grace Bratrud
Fermi National Accelerator Laboratory, Northwestern University
N
Niyaz R. Beysengulov
EeroQ Corporation
D
Daniel C. Cole
Infleqtion
A
Alejandro Gómez Frieiro
IQM Quantum Computers
E
Elena O. Glen
EeroQ Corporation
H
Hao Hsu
IQM Quantum Computers
Gang Huang
Gang Huang
LBNL, Tsinghua
Quantum ComputingRF controltiming and synchronizationfemtosecondLLRF
R
Raymond Jow
Conductor Quantum
G
Greshma Shaji
IQM Quantum Computers
T
Tom Lubowe
NVIDIA
Ligeng Zhu
Ligeng Zhu
Nvidia
Machine LearningEfficient Deep Learning
L
Luis Mantilla Calderón
NVIDIA, University of Toronto, Vector Institute for Artificial Intelligence
N
Nicola Pancotti
NVIDIA
J
Joel Pendleton
Conductor Quantum
B
Brandon Severin
Conductor Quantum
C
Charles Etienne Staub
Harvard University
S
Sara Sussman
Fermi National Accelerator Laboratory
Antti Vepsäläinen
Antti Vepsäläinen
Postdoctoral researcher, Massachusetts Institute of Technology
Quantum engineeringQuantum computing
N
Neel Rajeshbhai Vora
Lawrence Berkeley National Laboratory
Yilun Xu
Yilun Xu
Lawrence Berkeley National Laboratory
Quantum ControlFPGA-based RF ControlAI / ML
Varinia Bernales
Varinia Bernales
University of Toronto
Theoretical ChemistryCatalysisGreen Chemistry
D
Daniel Bowring
Fermi National Accelerator Laboratory