ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

130K/year

🤖 AI Summary

Large language models (LLMs) struggle to bridge the semantic gap between general knowledge and domain-specific database semantics in Text-to-SQL tasks. Method: We propose an online self-evolving framework that continuously mines historical SQL execution logs to automatically extract and accumulate domain knowledge at both schema-structure and instance-data levels. It introduces a novel nested SQL-to-Text chain-of-thought reasoning mechanism, augmented with tuple-level semantic tracing, enabling reliable knowledge reflection and dynamic updating. The resulting lightweight, evolvable domain knowledge base requires no human annotation or model fine-tuning. Contribution/Results: Extensive experiments on benchmarks—including Spider and BIRD—demonstrate significant improvements in SQL generation accuracy for complex and domain-specific queries. The framework exhibits strong effectiveness and practicality for real-world deployment, validating its capacity to adaptively enhance LLMs’ domain understanding without supervised adaptation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language to SQL, but a significant semantic gap persists between their general knowledge and domain-specific semantics of databases. Historical translation logs constitute a rich source of this missing in-domain knowledge, where SQL queries inherently encapsulate real-world usage patterns of database schema. Existing methods primarily enhance the reasoning process for individual translations but fail to accumulate in-domain knowledge from past translations. We introduce ORANGE, an online self-evolutionary framework that constructs database-specific knowledge bases by parsing SQL queries from translation logs. By accumulating in-domain knowledge that contains schema and data semantics, ORANGE progressively reduces the semantic gap and enhances the accuracy of subsequent SQL translations. To ensure reliability, we propose a novel nested Chain-of-Thought SQL-to-Text strategy with tuple-semantic tracking, which reduces semantic errors during knowledge generation. Experiments on multiple benchmarks confirm the practicality of ORANGE, demonstrating its effectiveness for real-world Text-to-SQL deployment, particularly in handling complex and domain-specific queries.

Problem

Research questions and friction points this paper is trying to address.

Bridging semantic gap between LLMs and domain-specific databases

Accumulating in-domain knowledge from historical SQL translation logs

Reducing semantic errors in complex domain-specific query generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online framework constructs domain-specific knowledge bases

Nested Chain-of-Thought strategy ensures reliable knowledge generation

Self-evolutionary system accumulates knowledge from translation logs

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks