SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

This work addresses the limitations of existing Text-to-SQL evaluation benchmarks, which often rely on single metrics and fail to simulate real-world query scenarios, thereby hindering fine-grained model diagnosis. To overcome these shortcomings, the authors propose a multidimensional evaluation platform that innovatively integrates a real-workload alignment mechanism, a fine-grained query taxonomy, and a database scaling strategy. The platform incorporates diverse evaluation metrics and an interactive visualization interface, enabling customizable assessment configurations and in-depth error analysis. This comprehensive approach significantly enhances the thoroughness and practicality of model evaluation, effectively supporting the diagnosis and iterative refinement of Text-to-SQL systems.

Technology Category

Application Category

📝 Abstract

Text-to-SQL models have significantly improved with the adoption of Large Language Models (LLMs), leading to their increasing use in real-world applications. Although many benchmarks exist for evaluating the performance of text-to-SQL models, they often rely on a single aggregate score, lack evaluation under realistic settings, and provide limited insight into model behaviour across different query types. In this work, we present SQLyzr, a comprehensive benchmark and evaluation platform for text-to-SQL models. SQLyzr incorporates a diverse set of evaluation metrics that capture multiple aspects of generated queries, while enabling more realistic evaluation through workload alignment with real-world SQL usage patterns and database scaling. It further supports fine-grained query classification, error analysis, and workload augmentation, allowing users to better diagnose and improve text-to-SQL models. This demonstration showcases these capabilities through an interactive experience. Through SQLyzr's graphical interface, users can customize evaluation settings, analyze fine-grained reports, and explore additional features of the platform. We envision that SQLyzr facilitates the evaluation and iterative improvement of text-to-SQL models by addressing key limitations of existing benchmarks. The source code of SQLyzr is available at https://github.com/sepideh-abedini/SQLyzr.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

benchmark

evaluation

query classification

realistic evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-SQL

evaluation benchmark

workload alignment