Automated Discovery of Test Oracles for Database Management Systems Using LLMs

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
DBMS automated testing has long suffered from manual test oracle construction and high false positives with high computational overhead in existing LLM-driven approaches, largely due to hallucination. This paper proposes ConstruSQL, the first framework to introduce *constraint-abstraction queries*: it leverages LLMs to generate semantically equivalent SQL skeletons and employs a lightweight, formal SQL equivalence solver for rigorous validation—balancing generative capability with logical correctness guarantees. By integrating program synthesis and formal verification techniques, ConstruSQL enables scalable, low-false-positive automatic discovery and instantiation of test oracles. Evaluated on five mainstream DBMSs, it uncovered 40 previously unknown vulnerabilities (35 logic bugs), of which 36 have been confirmed and 26 patched. The framework significantly advances the automation, reliability, and practical applicability of DBMS testing.

Technology Category

Application Category

📝 Abstract
Since 2020, automated testing for Database Management Systems (DBMSs) has flourished, uncovering hundreds of bugs in widely-used systems. A cornerstone of these techniques is test oracle, which typically implements a mechanism to generate equivalent query pairs, thereby identifying bugs by checking the consistency between their results. However, while applying these oracles can be automated, their design remains a fundamentally manual endeavor. This paper explores the use of large language models (LLMs) to automate the discovery and instantiation of test oracles, addressing a long-standing bottleneck towards fully automated DBMS testing. Although LLMs demonstrate impressive creativity, they are prone to hallucinations that can produce numerous false positive bug reports. Furthermore, their significant monetary cost and latency mean that LLM invocations should be limited to ensure that bug detection is efficient and economical. To this end, we introduce Argus, a novel framework built upon the core concept of the Constrained Abstract Query - a SQL skeleton containing placeholders and their associated instantiation conditions (e.g., requiring a placeholder to be filled by a boolean column). Argus uses LLMs to generate pairs of these skeletons that are asserted to be semantically equivalent. This equivalence is then formally proven using a SQL equivalence solver to ensure soundness. Finally, the placeholders within the verified skeletons are instantiated with concrete, reusable SQL snippets that are also synthesized by LLMs to efficiently produce complex test cases. We implemented Argus and evaluated it on five extensively tested DBMSs, discovering 40 previously unknown bugs, 35 of which are logic bugs, with 36 confirmed and 26 already fixed by the developers.
Problem

Research questions and friction points this paper is trying to address.

Automating test oracle discovery for DBMS using large language models
Addressing manual design bottleneck in database management system testing
Reducing false positives and costs in LLM-based bug detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to generate equivalent SQL query pairs
Formally proving semantic equivalence with SQL solvers
Instantiating verified skeletons with reusable SQL snippets
🔎 Similar Papers
No similar papers found.
Qiuyang Mang
Qiuyang Mang
University of California, Berkeley
Software EngineeringDatabase
R
Runyuan He
National University of Singapore
S
Suyang Zhong
National University of Singapore
X
Xiaoxuan Liu
National University of Singapore
Huanchen Zhang
Huanchen Zhang
Assistant Professor, Tsinghua University
Database SystemsData Structures
A
Alvin Cheung
UC Berkeley