NL2KQL: From Natural Language to Kusto Query

📅 2024-04-03

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

176K/year

🤖 AI Summary

To address the challenge non-technical users face in directly querying large-scale semi-structured time-series data (e.g., logs, telemetry), this paper introduces the first natural-language-to-Kusto-Query-Language (NL2KQL) framework. Methodologically, it proposes a three-module协同 architecture—Schema Refiner, Dynamic Few-shot Selector, and Query Refiner—that jointly integrates LLM-based semantic parsing, schema refinement, context-aware few-shot retrieval, and KQL syntax/semantic error correction. We construct the first open-source, contextually grounded synthetic NLQ–KQL benchmark dataset and support multi-dimensional evaluation at both execution-level and parsing-level. Experiments on real-world Kusto deployments demonstrate a 32.7% improvement in execution accuracy over state-of-the-art baselines; ablation studies confirm the significant contribution of each module. All code, datasets, and evaluation tools are publicly released.

Technology Category

Application Category

📝 Abstract

Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data analytics platforms. This paper introduces NL2KQL an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several key components: Schema Refiner which narrows down the schema to its most pertinent elements; the Few-shot Selector which dynamically selects relevant examples from a few-shot dataset; and the Query Refiner which repairs syntactic and semantic errors in KQL queries. Additionally, this study outlines a method for generating large datasets of synthetic NLQ-KQL pairs which are valid within a specific database contexts. To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics. Through ablation studies, the significance of each framework component is examined, and the datasets used for benchmarking are made publicly available. This work is the first of its kind and is compared with available baselines to demonstrate its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Natural Language Processing

Kusto Query Language

Data Retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

NL2KQL

Natural Language to KQL Translation

Kusto Query Language Accessibility

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks