Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

To address the challenge that non-technical users face in directly querying NoSQL databases, this paper proposes an end-to-end natural language-to-NoSQL query translation method. We formally define the Text-to-NoSQL task for the first time and introduce TEND—the first large-scale, open-source benchmark dataset comprising over 12,000 high-quality natural language–NoSQL query pairs. We propose SMART, a multi-step reasoning framework integrating small language models (SLMs) with retrieval-augmented generation (RAG), enhanced by automated data synthesis and execution-level evaluation. Experiments on TEND demonstrate substantial improvements in query accuracy and executability, establishing a new state-of-the-art (SOTA) baseline and enabling seamless interaction with real-world NoSQL systems. Our core contributions span four dimensions: (1) formal task definition, (2) construction of the first large-scale Text-to-NoSQL benchmark, (3) design of a lightweight yet effective SLM-RAG framework, and (4) introduction of an execution-grounded evaluation paradigm.

Technology Category

Application Category

📝 Abstract

NoSQL databases have become increasingly popular due to their outstanding performance in handling large-scale, unstructured, and semi-structured data, highlighting the need for user-friendly interfaces to bridge the gap between non-technical users and complex database queries. In this paper, we introduce the Text-to-NoSQL task, which aims to convert natural language queries into NoSQL queries, thereby lowering the technical barrier for non-expert users. To promote research in this area, we developed a novel automated dataset construction process and released a large-scale and open-source dataset for this task, named TEND (short for Text-to-NoSQL Dataset). Additionally, we designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-NoSQL conversion. To ensure comprehensive evaluation of the models, we also introduced a detailed set of metrics that assess the model's performance from both the query itself and its execution results. Our experimental results demonstrate the effectiveness of our approach and establish a benchmark for future research in this emerging field. We believe that our contributions will pave the way for more accessible and intuitive interactions with NoSQL databases.

Problem

Research questions and friction points this paper is trying to address.

Convert natural language to NoSQL queries

Develop automated dataset construction process

Design multi-step framework for Text-to-NoSQL conversion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-NoSQL task introduced

SMART framework for conversion

Open-source dataset TEND created

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks