Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge that non-technical users face in directly querying NoSQL databases, this paper proposes an end-to-end natural language-to-NoSQL query translation method. We formally define the Text-to-NoSQL task for the first time and introduce TEND—the first large-scale, open-source benchmark dataset comprising over 12,000 high-quality natural language–NoSQL query pairs. We propose SMART, a multi-step reasoning framework integrating small language models (SLMs) with retrieval-augmented generation (RAG), enhanced by automated data synthesis and execution-level evaluation. Experiments on TEND demonstrate substantial improvements in query accuracy and executability, establishing a new state-of-the-art (SOTA) baseline and enabling seamless interaction with real-world NoSQL systems. Our core contributions span four dimensions: (1) formal task definition, (2) construction of the first large-scale Text-to-NoSQL benchmark, (3) design of a lightweight yet effective SLM-RAG framework, and (4) introduction of an execution-grounded evaluation paradigm.

Technology Category

Application Category

📝 Abstract
NoSQL databases have become increasingly popular due to their outstanding performance in handling large-scale, unstructured, and semi-structured data, highlighting the need for user-friendly interfaces to bridge the gap between non-technical users and complex database queries. In this paper, we introduce the Text-to-NoSQL task, which aims to convert natural language queries into NoSQL queries, thereby lowering the technical barrier for non-expert users. To promote research in this area, we developed a novel automated dataset construction process and released a large-scale and open-source dataset for this task, named TEND (short for Text-to-NoSQL Dataset). Additionally, we designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-NoSQL conversion. To ensure comprehensive evaluation of the models, we also introduced a detailed set of metrics that assess the model's performance from both the query itself and its execution results. Our experimental results demonstrate the effectiveness of our approach and establish a benchmark for future research in this emerging field. We believe that our contributions will pave the way for more accessible and intuitive interactions with NoSQL databases.
Problem

Research questions and friction points this paper is trying to address.

Convert natural language to NoSQL queries
Develop automated dataset construction process
Design multi-step framework for Text-to-NoSQL conversion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-NoSQL task introduced
SMART framework for conversion
Open-source dataset TEND created
🔎 Similar Papers
No similar papers found.
J
Jinwei Lu
The Hong Kong Polytechnic University, Hong Kong, China
Yuanfeng Song
Yuanfeng Song
Unknown affiliation
NLP4DataData VisualizationText2SQLLLM
Z
Zhiqian Qin
The Hong Kong Polytechnic University, Hong Kong, China
Haodi Zhang
Haodi Zhang
Associate Professor of CS, Shenzhen University, China
AIKnowledge RepresentationNLP
C
Chen Zhang
The Hong Kong Polytechnic University, Hong Kong, China
Raymond Chi-Wing Wong
Raymond Chi-Wing Wong
The Hong Kong University of Science and Technology
databasesdata mining