Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant

๐Ÿ“… 2025-08-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) exhibit low accuracy in generating executable Qiskit quantum code, particularly failing to produce circuits compatible with real quantum hardware. Method: This paper proposes a quantum-aware alignment framework comprising three components: (1) a synthetic problem-test-pair data pipeline; (2) a quantum-verifiable reward mechanism that incorporates direct execution feedback from physical quantum devices into training; and (3) preference alignment via a hybrid approach combining Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO). Contribution/Results: To our knowledge, this is the first work to enable quantum hardware feedbackโ€“driven optimization of LLM-generated quantum code. Evaluated on the Qiskit-HumanEval-hard benchmark, our method significantly outperforms the strongest open-source baselines, achieving new state-of-the-art performance in both functional correctness and hardware executability of generated Qiskit code.

Technology Category

Application Category

๐Ÿ“ Abstract
Qiskit is an open-source quantum computing framework that allows users to design, simulate, and run quantum circuits on real quantum hardware. We explore post-training techniques for LLMs to assist in writing Qiskit code. We introduce quantum verification as an effective method for ensuring code quality and executability on quantum hardware. To support this, we developed a synthetic data pipeline that generates quantum problem-unit test pairs and used it to create preference data for aligning LLMs with DPO. Additionally, we trained models using GRPO, leveraging quantum-verifiable rewards provided by the quantum hardware. Our best-performing model, combining DPO and GRPO, surpasses the strongest open-source baselines on the challenging Qiskit-HumanEval-hard benchmark.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Qiskit code generation with quantum verification
Ensuring code quality and executability on quantum hardware
Aligning LLMs using quantum-verifiable rewards from hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum verification for code quality
Synthetic data pipeline for problem-unit pairs
Combining DPO and GRPO with quantum rewards
๐Ÿ”Ž Similar Papers
No similar papers found.
N
Nicolas Dupuis
IBM Quantum
A
Adarsh Tiwari
IBM Quantum
Youssef Mroueh
Youssef Mroueh
Principal Research Scientist, IBM T.J Watson Research Center
Machine learningArtificial intelligence
D
David Kremer
IBM Quantum
I
Ismael Faro
IBM Quantum
J
Juan Cruz-Benito
IBM Quantum