Hallucination Detection for LLM-based Text-to-SQL Generation via Two-Stage Metamorphic Testing

πŸ“… 2025-12-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Hallucination-induced SQL errors in LLM-based text-to-SQL generation are difficult to detect due to the absence of ground-truth SQL labels. Method: This paper proposes a label-free, two-stage metamorphic testing framework: (1) structure-aware schema perturbation to detect schema-linking errors; and (2) logic-aware semantic mutation coupled with cross-output consistency checking to localize logical synthesis errors. Contribution/Results: It introduces the first β€œstructure–logic” dual-dimensional metamorphic relation design, eliminating reliance on human-annotated SQL and enabling end-to-end hallucination localization. Evaluated on standard benchmarks, it achieves F1-scores of 69.36%–82.76%, significantly outperforming baselines such as LLM self-assessment. The approach establishes a novel, trustworthy paradigm for diagnosing errors in LLM-driven database interaction.

Technology Category

Application Category

πŸ“ Abstract
In Text-to-SQL generation, large language models (LLMs) have shown strong generalization and adaptability. However, LLMs sometimes generate hallucinations, i.e.,unrealistic or illogical content, which leads to incorrect SQL queries and negatively impacts downstream applications. Detecting these hallucinations is particularly challenging. Existing Text-to-SQL error detection methods, which are tailored for traditional deep learning models, face significant limitations when applied to LLMs. This is primarily due to the scarcity of ground-truth data. To address this challenge, we propose SQLHD, a novel hallucination detection method based on metamorphic testing (MT) that does not require standard answers. SQLHD splits the detection task into two sequentiial stages: schema-linking hallucination detection via eight structure-aware Metamorphic Relations (MRs) that perturb comparative words, entities, sentence structure or database schema, and logical-synthesis hallucination detection via nine logic-aware MRs that mutate prefix words, extremum expressions, comparison ranges or the entire database. In each stage the LLM is invoked separately to generate schema mappings or SQL artefacts; the follow-up outputs are cross-checked against their source counterparts through the corresponding MRs, and any violation is flagged as a hallucination without requiring ground-truth SQL. The experimental results demonstrate our method's superior performance in terms of the F1-score, which ranges from 69.36% to 82.76%. Additionally, SQLHD demonstrates superior performance over LLM Self-Evaluation methods, effectively identifying hallucinations in Text-to-SQL tasks.
Problem

Research questions and friction points this paper is trying to address.

Detects hallucinations in LLM-generated SQL queries without ground-truth data.
Addresses limitations of existing error detection methods for LLM-based Text-to-SQL.
Uses two-stage metamorphic testing to identify schema-linking and logical-synthesis errors.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage metamorphic testing without ground-truth data
Structure-aware MRs perturb schema and comparative elements
Logic-aware MRs mutate SQL expressions and database content
πŸ”Ž Similar Papers
No similar papers found.
B
Bo Yang
Beijing Forestry University, China
Y
Yinfen Xia
Beijing Forestry University, China
Weisong Sun
Weisong Sun
Nanyang Technological University
Trustworthy Intelligent SE (Software Engineering)
Y
Yang Liu
Nanyang Technological University, Singapore