GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Enterprise Text-to-SQL systems face challenges including difficulty integrating domain-specific business knowledge, low accuracy in generating complex SQL queries, and limited capacity for continuous system evolution. To address these, we propose a feedback-driven Text-to-SQL system tailored for enterprise settings. Our method introduces: (1) a novel knowledge set dynamic evolution mechanism grounded in user feedback; (2) a multi-stage compositional operator architecture that decouples the process into knowledge retrieval, chain-of-thought–driven hierarchical query planning (supporting subqueries, directives, and schema elements), and verifiable SQL execution—augmented by a regeneration feedback loop triggered by syntactic or semantic errors; and (3) integrated interactive Copilot interface and regression-test–guided knowledge merging. Evaluated in real-world enterprise environments, our system significantly improves accuracy on complex SQL generation and business adaptability, delivering low-latency, highly interpretable, and sustainably evolving Text-to-SQL services.

Technology Category

Application Category

📝 Abstract

Recent advancements in Text-to-SQL, driven by large language models, are democratizing data access. Despite these advancements, enterprise deployments remain challenging due to the need to capture business-specific knowledge, handle complex queries, and meet expectations of continuous improvements. To address these issues, we designed and implemented GenEdit: our Text-to-SQL generation system that improves with user feedback. GenEdit builds and maintains a company-specific knowledge set, employs a pipeline of operators decomposing SQL generation, and uses feedback to update its knowledge set to improve future SQL generations. We describe GenEdit's architecture made of two core modules: (i) decomposed SQL generation; and (ii) knowledge set edits based on user feedback. For generation, GenEdit leverages compounding operators to improve knowledge retrieval and to create a plan as chain-of-thought steps that guides generation. GenEdit first retrieves relevant examples in an initial retrieval stage where original SQL queries are decomposed into sub-statements, clauses or sub-queries. It then also retrieves instructions and schema elements. Using the retrieved contextual information, GenEdit then generates step-by-step plan in natural language on how to produce the query. Finally, GenEdit uses the plan to generate SQL, minimizing the need for model reasoning, which enhances complex SQL generation. If necessary, GenEdit regenerates the query based on syntactic and semantic errors. The knowledge set edits are recommended through an interactive copilot, allowing users to iterate on their feedback and to regenerate SQL queries as needed. Each generation uses staged edits which update the generation prompt. Once the feedback is submitted, it gets merged after passing regression testing and obtaining an approval, improving future generations.

Problem

Research questions and friction points this paper is trying to address.

Enterprise Text-to-SQL challenges due to business-specific knowledge needs

Handling complex SQL queries with decomposed generation operators

Continuous system improvement via user feedback and knowledge updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compounding operators for SQL decomposition

User feedback-driven knowledge set updates

Staged prompt edits for continuous improvement

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks