🤖 AI Summary
This work addresses the safety risks posed by large language models (LLMs) in text-to-SQL tasks, where ambiguous or unanswerable queries often lead to syntactically valid but semantically incorrect or constraint-violating SQL outputs. To mitigate this, the authors frame safe rejection as an answerability gating problem and propose a lightweight triple-residual gating encoder that leverages intermediate hidden activations from the LLM. This mechanism effectively suppresses schema-induced noise while amplifying sparse signals indicative of mismatches between the input question and the database schema. Designed as a plug-and-play safety layer, the approach achieves an average F1 score of 88.5% across four benchmarks with only ~2 ms of additional inference latency, significantly enhancing the safety and robustness of text-to-SQL systems.
📝 Abstract
In LLM-based text-to-SQL systems, unanswerable and underspecified user queries may generate not only incorrect text but also executable programs that yield misleading results or violate safety constraints, posing a major barrier to safe deployment. Existing refusal strategies for such queries either rely on output-level instruction following, which is brittle due to model hallucinations, or estimate output uncertainty, which adds complexity and overhead. To address this challenge, we formalize safe refusal in text-to-SQL systems as an answerability-gating problem and propose LatentRefusal, a latent-signal refusal mechanism that predicts query answerability from intermediate hidden activations of a large language model. We introduce the Tri-Residual Gated Encoder, a lightweight probing architecture, to suppress schema noise and amplify sparse, localized cues of question-schema mismatch that indicate unanswerability. Extensive empirical evaluations across diverse ambiguous and unanswerable settings, together with ablation studies and interpretability analyses, demonstrate the effectiveness of the proposed approach and show that LatentRefusal provides an attachable and efficient safety layer for text-to-SQL systems. Across four benchmarks, LatentRefusal improves average F1 to 88.5 percent on both backbones while adding approximately 2 milliseconds of probe overhead.