ODIN: A NL2SQL Recommender to Handle Schema Ambiguity

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Ambiguous schema semantics—particularly semantic similarity among table/column names in enterprise databases—severely impair intent recognition accuracy in NL2SQL systems. To address this, we propose an ambiguity-aware dynamic multi-candidate SQL recommendation framework. First, we construct a schema-aware ambiguity graph to quantify semantic ambiguity across database elements. Second, leveraging large language models’ semantic understanding and confidence estimation, we adaptively generate *K* candidate SQL queries, where *K* is dynamically determined by the inferred ambiguity level. Third, we employ implicit user feedback to drive online reinforcement learning for personalized candidate ranking. This work is the first to deeply integrate ambiguity modeling, dynamic candidate generation, and feedback-driven personalization. Evaluated on real-world enterprise datasets, our method achieves a 1.5×–2× improvement in exact-match SQL retrieval rate over state-of-the-art approaches, significantly mitigating query bias under semantic ambiguity.

Technology Category

Application Category

📝 Abstract

NL2SQL (natural language to SQL) systems translate natural language into SQL queries, allowing users with no technical background to interact with databases and create tools like reports or visualizations. While recent advancements in large language models (LLMs) have significantly improved NL2SQL accuracy, schema ambiguity remains a major challenge in enterprise environments with complex schemas, where multiple tables and columns with semantically similar names often co-exist. To address schema ambiguity, we introduce ODIN, a NL2SQL recommendation engine. Instead of producing a single SQL query given a natural language question, ODIN generates a set of potential SQL queries by accounting for different interpretations of ambiguous schema components. ODIN dynamically adjusts the number of suggestions based on the level of ambiguity, and ODIN learns from user feedback to personalize future SQL query recommendations. Our evaluation shows that ODIN improves the likelihood of generating the correct SQL query by 1.5-2$ imes$ compared to baselines.

Problem

Research questions and friction points this paper is trying to address.

Addresses schema ambiguity in NL2SQL systems for complex databases

Generates multiple SQL queries to handle ambiguous schema interpretations

Improves correct SQL generation likelihood by 1.5-2x via user feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates multiple SQL queries for ambiguity

Dynamically adjusts suggestions based on ambiguity

Learns from feedback to personalize recommendations

🔎 Similar Papers

No similar papers found.