QSpark: Towards Reliable Qiskit Code Generation

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (e.g., Granite-20B-Code, StarCoder) exhibit high error rates and limited error recovery when generating Qiskit quantum circuit code. Method: We propose a reinforcement learning–based reliability enhancement approach, fine-tuning a 32B-parameter model on a high-quality synthetic dataset using two preference optimization algorithms—GRPO and ORPO. Contribution/Results: On the Qiskit HumanEval benchmark, ORPO achieves 56.29% Pass@1, while GRPO attains 49.0%, substantially outperforming existing general-purpose baselines—particularly on foundational and medium-difficulty tasks. This work represents the first systematic application of ORPO to domain-specific quantum programming code generation, empirically validating its efficacy in this setting. It establishes a novel paradigm for improving automation reliability in quantum software development.

Technology Category

Application Category

📝 Abstract
Quantum circuits must be error-resilient, yet LLMs like Granite-20B-Code and StarCoder often output flawed Qiskit code. We fine-tuned a 32 B model with two RL methods, Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO), using a richly annotated synthetic dataset. On the Qiskit HumanEval benchmark, ORPO reaches 56.29% Pass@1 ($approx+10$ pp over Granite-8B-QK) and GRPO hits 49%, both beating all general-purpose baselines; on the original HumanEval they score 65.90% and 63.00%. GRPO excels on basic tasks (42/54), ORPO on intermediate ones (41/68), and neither solves the five advanced tasks, highlighting clear gains yet room for progress in AI-assisted quantum programming.
Problem

Research questions and friction points this paper is trying to address.

Improving reliability of Qiskit code generation by LLMs
Enhancing error-resilience in quantum circuit implementations
Advancing AI-assisted quantum programming with RL methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned 32B model with GRPO and ORPO
Used synthetic dataset with rich annotations
Achieved top Qiskit HumanEval benchmark scores
🔎 Similar Papers
No similar papers found.