ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to bridge the semantic gap between general knowledge and domain-specific database semantics in Text-to-SQL tasks. Method: We propose an online self-evolving framework that continuously mines historical SQL execution logs to automatically extract and accumulate domain knowledge at both schema-structure and instance-data levels. It introduces a novel nested SQL-to-Text chain-of-thought reasoning mechanism, augmented with tuple-level semantic tracing, enabling reliable knowledge reflection and dynamic updating. The resulting lightweight, evolvable domain knowledge base requires no human annotation or model fine-tuning. Contribution/Results: Extensive experiments on benchmarks—including Spider and BIRD—demonstrate significant improvements in SQL generation accuracy for complex and domain-specific queries. The framework exhibits strong effectiveness and practicality for real-world deployment, validating its capacity to adaptively enhance LLMs’ domain understanding without supervised adaptation.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language to SQL, but a significant semantic gap persists between their general knowledge and domain-specific semantics of databases. Historical translation logs constitute a rich source of this missing in-domain knowledge, where SQL queries inherently encapsulate real-world usage patterns of database schema. Existing methods primarily enhance the reasoning process for individual translations but fail to accumulate in-domain knowledge from past translations. We introduce ORANGE, an online self-evolutionary framework that constructs database-specific knowledge bases by parsing SQL queries from translation logs. By accumulating in-domain knowledge that contains schema and data semantics, ORANGE progressively reduces the semantic gap and enhances the accuracy of subsequent SQL translations. To ensure reliability, we propose a novel nested Chain-of-Thought SQL-to-Text strategy with tuple-semantic tracking, which reduces semantic errors during knowledge generation. Experiments on multiple benchmarks confirm the practicality of ORANGE, demonstrating its effectiveness for real-world Text-to-SQL deployment, particularly in handling complex and domain-specific queries.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic gap between LLMs and domain-specific databases
Accumulating in-domain knowledge from historical SQL translation logs
Reducing semantic errors in complex domain-specific query generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online framework constructs domain-specific knowledge bases
Nested Chain-of-Thought strategy ensures reliable knowledge generation
Self-evolutionary system accumulates knowledge from translation logs
🔎 Similar Papers
2024-06-20North American Chapter of the Association for Computational LinguisticsCitations: 1
Y
Yiwen Jiao
Fudan University, China
T
Tonghui Ren
Tencent Cloud, China
Y
Yuche Gao
University of Cambridge, United Kingdom
Z
Zhenying He
Fudan University, China
Y
Yinan Jing
Fudan University, China
K
Kai Zhang
Fudan University, China
X. Sean Wang
X. Sean Wang
School of Computer Science, Fudan University
Database SystemsInformation Security and PrivacyWireless Sensor NetworksStreaming Data Processing Time Series QueriesDat