Business Logic-Driven Text-to-SQL Data Synthesis for Business Intelligence

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing Text-to-SQL evaluation benchmarks, which lack authentic business logic and workflows, thereby failing to effectively assess model performance in private business intelligence settings. To bridge this gap, the authors propose the first business-logic-driven synthetic data generation framework that systematically integrates business roles, scenarios, and workflows, while incorporating a mechanism to control reasoning complexity. The framework ensures high-fidelity and diverse Text-to-SQL examples through business modeling, scenario-guided generation, complexity regulation, and SQL semantic alignment validation. Evaluated on Salesforce production databases, the synthesized data achieves 98.44% business authenticity—significantly outperforming OmniSQL and SQL-Factory—and reveals that state-of-the-art models attain only 42.86% execution accuracy on complex business queries.

Technology Category

Application Category

📝 Abstract
Evaluating Text-to-SQL agents in private business intelligence (BI) settings is challenging due to the scarcity of realistic, domain-specific data. While synthetic evaluation data offers a scalable solution, existing generation methods fail to capture business realism--whether questions reflect realistic business logic and workflows. We propose a Business Logic-Driven Data Synthesis framework that generates data grounded in business personas, work scenarios, and workflows. In addition, we improve the data quality by imposing a business reasoning complexity control strategy that diversifies the analytical reasoning steps required to answer the questions. Experiments on a production-scale Salesforce database show that our synthesized data achieves high business realism (98.44%), substantially outperforming OmniSQL (+19.5%) and SQL-Factory (+54.7%), while maintaining strong question-SQL alignment (98.59%). Our synthetic data also reveals that state-of-the-art Text-to-SQL models still have significant performance gaps, achieving only 42.86% execution accuracy on the most complex business queries.
Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL
Business Intelligence
Data Synthesis
Business Logic
Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Business Logic-Driven
Text-to-SQL
Data Synthesis
Business Intelligence
Reasoning Complexity Control
🔎 Similar Papers
No similar papers found.