TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models

πŸ“… 2026-01-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes TiChart, an LLM-based automated system for cross-domain exploratory data analysis (EDA) that addresses the limited generalization of existing SQL-driven approaches and better harnesses the capabilities of large language models. Through natural language interaction, TiChart enables end-to-end data understanding, SQL generation, and visualization. The system introduces a novel hierarchical data context construction mechanism along with question clarification and decomposition strategies, significantly enhancing cross-domain adaptability and user accessibility. Integrating Text-to-SQL (TiSQL), natural language processing, and a graphical user interface, TiChart has been successfully deployed in PingCAP’s production environment, demonstrating its practicality and effectiveness across multiple real-world datasets.

Technology Category

Application Category

πŸ“ Abstract
The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration. However, existing methods generally lack the ability for cross-domain analysis, and the exploration of LLMs capabilities remains insufficient. This paper presents TiInsight, an SQL-based automated cross-domain exploratory data analysis system. First, TiInsight offers a user-friendly GUI enabling users to explore data using natural language queries. Second, TiInsight offers a robust cross-domain exploratory data analysis pipeline: hierarchical data context (i.e., HDC) generation, question clarification and decomposition, text-to-SQL (i.e., TiSQL), and data visualization (i.e., TiChart). Third, we have implemented and deployed TiInsight in the production environment of PingCAP and demonstrated its capabilities using representative datasets. The demo video is available at https://youtu.be/JzYFyYd-emI.
Problem

Research questions and friction points this paper is trying to address.

exploratory data analysis
cross-domain analysis
large language models
SQL-based analysis
automated data exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-domain exploratory data analysis
large language models
text-to-SQL
hierarchical data context
automated data visualization
πŸ”Ž Similar Papers
No similar papers found.
J
Jun-Peng Zhu
Northwest A&F University, PingCAP
B
Boyan Niu
PingCAP, China
P
Peng Cai
East China Normal University, China
Z
Zheming Ni
PingCAP, China
K
Kai Xu
PingCAP, China
J
Jiajun Huang
PingCAP, China
S
Shengbo Ma
PingCAP, China
B
Bing Wang
PingCAP, China
Xuan Zhou
Xuan Zhou
Professor, School of Data Science and Engineering, East China Normal University
databaseinformation retrieval
G
Guanglel Bao
PingCAP, China
D
Donghui Zhang
PingCAP, China
L
Liu Tang
PingCAP, China
Q
Qi Liu
PingCAP, China