Text2VectorSQL: Bridging Text-to-SQL and Vector Search for Unified Natural Language Queries

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Existing Text-to-SQL methods struggle with unstructured data and semantically ambiguous queries, while VectorSQL systems rely on manual construction and lack customizable evaluation, preventing their theoretical potential from translating into practical utility. Method: We propose Text2VectorSQL, the first framework unifying Text-to-SQL generation with vector retrieval to enable semantic filtering, multimodal matching, and retrieval acceleration. Our approach introduces an end-to-end automated annotation pipeline integrating SQL generation, vector index construction, semantic query expansion, synthetic data training, and expert validation. Contribution/Results: Experiments demonstrate that our model significantly outperforms baselines across diverse natural language database querying tasks. We establish Text2VectorSQL as a novel task paradigm and provide a scalable, evaluable, unified foundation for general-purpose natural language database interfaces.

Technology Category

Application Category

📝 Abstract

While Text-to-SQL enables natural language interaction with structured databases, its effectiveness diminishes with unstructured data or ambiguous queries due to rigid syntax and limited expressiveness. Concurrently, vector search has emerged as a powerful paradigm for semantic retrieval, particularly for unstructured data. However, existing VectorSQL implementations still rely heavily on manual crafting and lack tailored evaluation frameworks, leaving a significant gap between theoretical potential and practical deployment. To bridge these complementary paradigms, we introduces Text2VectorSQL, a novel framework unifying Text-to-SQL and vector search to overcome expressiveness constraints and support more diverse and holistical natural language queries. Specifically, Text2VectorSQL enables semantic filtering, multi-modal matching, and retrieval acceleration. For evaluation, we build vector index on appropriate columns, extend user queries with semantic search, and annotate ground truths via an automatic pipeline with expert review. Furthermore, we develop dedicated Text2VectorSQL models with synthetic data, demonstrating significant performance improvements over baseline methods. Our work establishes the foundation for the Text2VectorSQL task, paving the way for more versatile and intuitive database interfaces. The repository will be publicly available at https://github.com/Open-DataFlow/Text2VectorSQL.

Problem

Research questions and friction points this paper is trying to address.

Unifying Text-to-SQL and vector search for diverse queries

Overcoming expressiveness constraints in natural language queries

Addressing lack of tailored evaluation frameworks for VectorSQL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies Text-to-SQL and vector search

Enables semantic filtering and multi-modal matching

Develops dedicated models with synthetic data

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks