🤖 AI Summary
This work addresses the limitation of existing Text-to-SQL evaluation benchmarks, which lack authentic business logic and workflows, thereby failing to effectively assess model performance in private business intelligence settings. To bridge this gap, the authors propose the first business-logic-driven synthetic data generation framework that systematically integrates business roles, scenarios, and workflows, while incorporating a mechanism to control reasoning complexity. The framework ensures high-fidelity and diverse Text-to-SQL examples through business modeling, scenario-guided generation, complexity regulation, and SQL semantic alignment validation. Evaluated on Salesforce production databases, the synthesized data achieves 98.44% business authenticity—significantly outperforming OmniSQL and SQL-Factory—and reveals that state-of-the-art models attain only 42.86% execution accuracy on complex business queries.
📝 Abstract
Evaluating Text-to-SQL agents in private business intelligence (BI) settings is challenging due to the scarcity of realistic, domain-specific data. While synthetic evaluation data offers a scalable solution, existing generation methods fail to capture business realism--whether questions reflect realistic business logic and workflows. We propose a Business Logic-Driven Data Synthesis framework that generates data grounded in business personas, work scenarios, and workflows. In addition, we improve the data quality by imposing a business reasoning complexity control strategy that diversifies the analytical reasoning steps required to answer the questions. Experiments on a production-scale Salesforce database show that our synthesized data achieves high business realism (98.44%), substantially outperforming OmniSQL (+19.5%) and SQL-Factory (+54.7%), while maintaining strong question-SQL alignment (98.59%). Our synthetic data also reveals that state-of-the-art Text-to-SQL models still have significant performance gaps, achieving only 42.86% execution accuracy on the most complex business queries.