Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biomedical Text-to-SQL systems often generate erroneous SQL for ambiguous, out-of-scope, or unanswerable queries, undermining reliability. Method: We propose an explicit refusal mechanism to enhance trustworthiness, introducing (i) novel No-Answer Rules (NAR) and a balanced few-shot prompting strategy; (ii) OncoMX-NAQ—the first biomedical unanswerable question benchmark (80 instances across 8 categories); and (iii) a unified framework integrating schema-aware prompting, rule-guided learning, and structured refusal classification. Contributions/Results: Our approach enables interpretable refusal decisions and features a lightweight, interactive visualization interface. On OncoMX-NAQ, it achieves 0.80 overall refusal accuracy, with near-perfect (≈100%) accuracy on critical error classes—including non-SQL queries, missing-column cases, and out-of-domain questions. Crucially, it supports synchronous presentation of generated SQL, execution results, and human-readable refusal rationales.

Technology Category

Application Category

📝 Abstract
Text-to-SQL systems allow non-SQL experts to interact with relational databases using natural language. However, their tendency to generate executable SQL for ambiguous, out-of-scope, or unanswerable queries introduces a hidden risk, as outputs may be misinterpreted as correct. This risk is especially serious in biomedical contexts, where precision is critical. We therefore present Query Carefully, a pipeline that integrates LLM-based SQL generation with explicit detection and handling of unanswerable inputs. Building on the OncoMX component of ScienceBenchmark, we construct OncoMX-NAQ (No-Answer Questions), a set of 80 no-answer questions spanning 8 categories (non-SQL, out-of-schema/domain, and multiple ambiguity types). Our approach employs llama3.3:70b with schema-aware prompts, explicit No-Answer Rules (NAR), and few-shot examples drawn from both answerable and unanswerable questions. We evaluate SQL exact match, result accuracy, and unanswerable-detection accuracy. On the OncoMX dev split, few-shot prompting with answerable examples increases result accuracy, and adding unanswerable examples does not degrade performance. On OncoMX-NAQ, balanced prompting achieves the highest unanswerable-detection accuracy (0.8), with near-perfect results for structurally defined categories (non-SQL, missing columns, out-of-domain) but persistent challenges for missing-value queries (0.5) and column ambiguity (0.3). A lightweight user interface surfaces interim SQL, execution results, and abstentions, supporting transparent and reliable text-to-SQL in biomedical applications.
Problem

Research questions and friction points this paper is trying to address.

Detecting unanswerable natural language queries in biomedical text-to-SQL systems
Preventing misleading SQL generation for ambiguous or out-of-scope questions
Improving reliability of biomedical database queries through explicit abstention mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLM-based SQL generation with unanswerable detection
Uses schema-aware prompts, explicit rules, and few-shot examples
Implements a lightweight UI for transparent biomedical text-to-SQL
🔎 Similar Papers
J
Jasmin Saxer
Institute of Computer Science, Zurich University of Applied Sciences, Technikumstrasse 9, 8401 Winterthur, Switzerland
I
Isabella Maria Aigner
Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
L
Luise Linzmeier
Department of Gastroenterology and Hepatology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
A
Andreas Weiler
Institute of Computer Science, Zurich University of Applied Sciences, Technikumstrasse 9, 8401 Winterthur, Switzerland
Kurt Stockinger
Kurt Stockinger
Professor of Computer Science, Zurich University of Applied Sciences
Data ScienceBig DataDatabase SystemsNatural Language InterfacesQuantum Machine Learning