🤖 AI Summary
Quantum algorithm design is hindered by quantum mechanical complexity and stringent control requirements, compounded by the absence of a dedicated benchmark for large language models (LLMs). Method: We introduce the first LLM-oriented benchmark for quantum algorithm design, comprising 25 canonical algorithms and 120,290 formally verified QASM circuits, underpinned by a novel formal framework integrating program synthesis, hierarchical task modeling (categorized into three algorithmic suites), fully automated verification, and interactive reasoning. Contribution/Results: Empirical analysis uncovers systematic error patterns in LLMs—particularly counterintuitive degradation under fine-tuning versus few-shot learning. End-to-end evaluation confirms baseline generation capability but reveals severe generalization bottlenecks. This work establishes a reproducible, verifiable infrastructure and delivers foundational insights for AI-driven quantum programming.
📝 Abstract
Quantum computing is an emerging field recognized for the significant speedup it offers over classical computing through quantum algorithms. However, designing and implementing quantum algorithms pose challenges due to the complex nature of quantum mechanics and the necessity for precise control over quantum states. Despite the significant advancements in AI, there has been a lack of datasets specifically tailored for this purpose. In this work, we introduce QCircuitBench, the first benchmark dataset designed to evaluate AI's capability in designing and implementing quantum algorithms using quantum programming languages. Unlike using AI for writing traditional codes, this task is fundamentally more complicated due to highly flexible design space. Our key contributions include: 1. A general framework which formulates the key features of quantum algorithm design for Large Language Models. 2. Implementations for quantum algorithms from basic primitives to advanced applications, spanning 3 task suites, 25 algorithms, and 120,290 data points. 3. Automatic validation and verification functions, allowing for iterative evaluation and interactive reasoning without human inspection. 4. Promising potential as a training dataset through preliminary fine-tuning results. We observed several interesting experimental phenomena: LLMs tend to exhibit consistent error patterns, and fine-tuning does not always outperform few-shot learning. In all, QCircuitBench is a comprehensive benchmark for LLM-driven quantum algorithm design, and it reveals limitations of LLMs in this domain.