Testing Database Systems with Large Language Model Synthesized Fragments

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Conventional SQL test generators rely on manually crafted rules, yielding limited syntactic and semantic coverage and failing to expose deep-seated defects in database management systems (DBMSs). Method: This paper introduces an LLM-driven SQL Sketch mechanism—dynamically synthesizing structured SQL fragments via large language models and integrating them into existing test generation frameworks. The approach synergistically combines equivalent query generation and fuzz testing to enhance SQL feature coverage and semantic diversity, overcoming manual-rule limitations in handling advanced constructs such as deeply nested queries, window functions, and recursive common table expressions (CTEs). Contribution/Results: Evaluated across five mainstream DBMSs, the method uncovered 55 previously unknown, unreported bugs; 50 were confirmed and patched by vendors, demonstrating its effectiveness, practicality, and superior capability in exposing subtle, high-impact DBMS defects.

Technology Category

Application Category

📝 Abstract

Various automated testing approaches have been proposed for Database Management Systems (DBMSs). Many such approaches generate pairs of equivalent queries to identify bugs that cause DBMSs to compute incorrect results, and have found hundreds of bugs in mature, widely used DBMSs. Most of these approaches are based on manually written SQL generators; however, their bug-finding capabilities remain constrained by the limited set of SQL features supported by the generators. In this work, we propose ShQveL, an approach that augments existing SQL test-case generators by leveraging Large Language Models (LLMs) to synthesize SQL fragments. Our key idea is to systematically incorporate SQL features gained through automated interactions with LLMs into the SQL generators, increasing the features covered while efficiently generating test cases. Specifically, ShQveL uses SQL sketches -- SQL statements with incomplete code segments that LLMs fill -- to integrate LLM-generated content into the generator. We evaluated ShQveL on 5 DBMSs and discovered 55 unique and previously unknown bugs, 50 of which were promptly fixed after our reports.

Problem

Research questions and friction points this paper is trying to address.

Enhancing SQL test-case generation using LLMs

Expanding SQL feature coverage for DBMS testing

Identifying unknown bugs in DBMSs via synthesized fragments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs to synthesize SQL fragments

Uses SQL sketches for LLM-generated content integration

Augments existing SQL generators for broader feature coverage

🔎 Similar Papers

No similar papers found.