🤖 AI Summary
This work addresses the limitations of traditional rule-based static analysis tools for quantum programs, which struggle to adapt to rapidly evolving APIs and exhibit limited capability in detecting context-sensitive issues. To overcome these challenges, the paper introduces a novel approach that integrates large language models (LLMs) into quantum program linting for the first time, combining chain-of-thought (CoT) prompting with retrieval-augmented generation (RAG). The resulting adaptive and extensible linter leverages a knowledge base of quantum programming best practices. Evaluated on 55 Qiskit programs, the method achieves an F1 score of 0.70 (0.68 with RAG), substantially outperforming the conventional tool LintQ (0.41), thereby significantly improving detection accuracy and reducing false positives.
📝 Abstract
As quantum computing transitions from theoretical experimentation to its practical application, the reliability of quantum software has become a critical bottleneck. Traditional static analysis techniques for quantum programs, primarily rule-based linters, are increasingly inadequate; they struggle to keep pace with rapidly evolving APIs and fail to capture complex, context-dependent quantum programming problems. This results in high maintenance overhead and limited detection capabilities. In this paper, we introduce LintQ-LLM+CoT and LintQ-LLM+RAG, novel approaches that redefine the detection of quantum programming problems by employing Large Language Models (LLMs) specialized, respectively, via Chain-of-Thought (CoT) prompting and a Retrieval-Augmented Generation (RAG) system that grounds the model's reasoning in a curated knowledge base of verified quantum programming problems and best practices. We conducted a rigorous and manual comparative evaluation against the state-of-the-art rule-based tool, LintQ, using a corpus of 55 Qiskit programs. Our results show that LLM-based approaches, with and without RAG, outperform LintQ in terms of quantum programming problems detection correctness (precision) and completeness (recall). Overall, LLM-based approaches were more effective than LintQ (F1-score equal to 0.70 and 0.68 vs. 0.41). Furthermore, the RAG-enhanced variant demonstrated a slightly superior precision, effectively reducing false positives. Our findings suggest that LLMs provide a scalable and adaptive foundation for the next generation of linters in quantum software engineering.