🤖 AI Summary
This work addresses the challenge of identifying compact yet sufficient context from large and ambiguous database schemas in large-scale Text-to-SQL tasks. The authors propose an uncertainty-aware multi-path schema linking approach that infers schema requirements across multiple plausible SQL derivations, distinguishing between essential schema items and those ambiguously dependent on specific paths. By dynamically retrieving evidence only when necessary, the method departs from conventional single-path deterministic selection paradigms and leverages large language models for efficient context filtering. Evaluated on Spider2-Snow, the approach achieves a field-level strict recall of 90.15% while using only 123.30K tokens on average, and significantly enhances downstream SQL generation performance under a fixed generator.
📝 Abstract
Schema linking is a difficult and important step in large-scale Text-to-SQL, where systems must identify a compact yet sufficient schema context from large and ambiguous databases. Existing methods often treat schema linking as deterministic selection around a single SQL path, but complex questions may admit multiple valid realizations with different schema needs. We reframe schema linking as uncertainty-aware schema-need inference over multiple plausible SQL paths, where the system distinguishes required schema items from path-dependent uncertain ones and acquires evidence only where needed. We instantiate this reframing with EviLink, which combines multi-hypothesis schema grounding with uncertainty-guided evidence acquisition. Experiments on BIRD-Dev and Spider2-Snow show that this perspective improves the balance among schema completeness, schema relevance, and token cost. On Spider2-Snow, EviLink achieves 90.15% field-level strict recall rate, uses 123.30K average tokens, and improves downstream SQL generation under a fixed generator.