Text to Query Plans for Question Answering on Large Tables

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address the challenge non-expert users face in efficiently querying and analyzing large tabular data, this paper proposes an end-to-end framework that translates natural language queries into executable query plans using large language models (LLMs). Unlike conventional SQL generation approaches, our method employs iterative semantic parsing to map natural language into heterogeneous operation sequences—including statistical and machine learning primitives (e.g., PCA, anomaly detection)—and executes them externally to the database, thereby circumventing LLM context-length limitations and the overhead of full-data loading. Experiments on standard benchmarks and large-scale scientific tabular datasets demonstrate substantial improvements in task completion rate and execution efficiency for complex analytical tasks. The framework supports flexible, scalable data analysis beyond the expressive capacity of SQL, offering a novel pathway for the NL2Data analysis paradigm.

Technology Category

Application Category

📝 Abstract

Efficient querying and analysis of large tabular datasets remain significant challenges, especially for users without expertise in programming languages like SQL. Text-to-SQL approaches have shown promising performance on benchmark data; however, they inherit SQL's drawbacks, including inefficiency with large datasets and limited support for complex data analyses beyond basic querying. We propose a novel framework that transforms natural language queries into query plans. Our solution is implemented outside traditional databases, allowing us to support classical SQL commands while avoiding SQL's inherent limitations. Additionally, we enable complex analytical functions, such as principal component analysis and anomaly detection, providing greater flexibility and extensibility than traditional SQL capabilities. We leverage LLMs to iteratively interpret queries and construct operation sequences, addressing computational complexity by incrementally building solutions. By executing operations directly on the data, we overcome context length limitations without requiring the entire dataset to be processed by the model. We validate our framework through experiments on both standard databases and large scientific tables, demonstrating its effectiveness in handling extensive datasets and performing sophisticated data analyses.

Problem

Research questions and friction points this paper is trying to address.

Translating natural language queries into executable query plans

Overcoming SQL limitations for large dataset analysis

Enabling complex analytical functions without SQL expertise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms natural language into query plans

Uses LLMs to iteratively interpret queries

Executes operations directly on data

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks