๐ค AI Summary
Natural language queries often suffer from semantic ambiguity, leading to inaccurate parsing in database systems.
Method: This paper proposes a data-aware Socratic interactive clarification framework that models conversational clarification as a first-class operator within database systemsโenabling the system to proactively and selectively initiate semantic disambiguation questions. Leveraging joint modeling of linguistic ambiguity, schema-matching confidence, and multi-backend (relational/vector) execution costs, we design a cost-benefit-driven question-selection mechanism. Optimal questions are chosen by jointly optimizing semantic relevance, catalog information gain, and execution-cost reduction potential.
Results: Experiments on three benchmark datasets demonstrate substantial improvements in query accuracy while limiting average interaction rounds to 1.2โ1.8. The framework achieves a balanced trade-off between precision and efficiency, establishing a new paradigm for collaborative natural language database interaction.
๐ Abstract
In this paper, we propose Data-Aware Socratic Guidance (DASG), a dialogue-based query enhancement framework that embeds linebreak interactive clarification as a first-class operator within database systems to resolve ambiguity in natural language queries. DASG treats dialogue as an optimization decision, asking clarifying questions only when the expected execution cost reduction exceeds the interaction overhead. The system quantifies ambiguity through linguistic fuzziness, schema grounding confidence, and projected costs across relational and vector backends. Our algorithm selects the optimal clarifications by combining semantic relevance, catalog-based information gain, and potential cost reduction. We evaluate our proposed framework on three datasets. The results show that DASG demonstrates improved query precision while maintaining efficiency, establishing a cooperative analytics paradigm where systems actively participate in query formulation rather than passively translating user requests.