Business Logic-Driven Text-to-SQL Data Synthesis for Business Intelligence

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work addresses the limitation of existing Text-to-SQL evaluation benchmarks, which lack authentic business logic and workflows, thereby failing to effectively assess model performance in private business intelligence settings. To bridge this gap, the authors propose the first business-logic-driven synthetic data generation framework that systematically integrates business roles, scenarios, and workflows, while incorporating a mechanism to control reasoning complexity. The framework ensures high-fidelity and diverse Text-to-SQL examples through business modeling, scenario-guided generation, complexity regulation, and SQL semantic alignment validation. Evaluated on Salesforce production databases, the synthesized data achieves 98.44% business authenticity—significantly outperforming OmniSQL and SQL-Factory—and reveals that state-of-the-art models attain only 42.86% execution accuracy on complex business queries.

Technology Category

Application Category

📝 Abstract

Evaluating Text-to-SQL agents in private business intelligence (BI) settings is challenging due to the scarcity of realistic, domain-specific data. While synthetic evaluation data offers a scalable solution, existing generation methods fail to capture business realism--whether questions reflect realistic business logic and workflows. We propose a Business Logic-Driven Data Synthesis framework that generates data grounded in business personas, work scenarios, and workflows. In addition, we improve the data quality by imposing a business reasoning complexity control strategy that diversifies the analytical reasoning steps required to answer the questions. Experiments on a production-scale Salesforce database show that our synthesized data achieves high business realism (98.44%), substantially outperforming OmniSQL (+19.5%) and SQL-Factory (+54.7%), while maintaining strong question-SQL alignment (98.59%). Our synthetic data also reveals that state-of-the-art Text-to-SQL models still have significant performance gaps, achieving only 42.86% execution accuracy on the most complex business queries.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

Business Intelligence

Data Synthesis

Business Logic

Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Business Logic-Driven

Text-to-SQL

Data Synthesis

Business Intelligence

Reasoning Complexity Control

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks

2024-07-21arXiv.orgCitations: 24

💼 Related Jobs

No related jobs found.

Authors to Follow