SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

📅 2026-04-25

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Traditional SQL struggles with semantic matching and unstructured text analysis in natural language queries, while existing LLM-augmented approaches require users to manually construct complex pipelines. This work proposes SEMA-SQL, the first system to seamlessly integrate the semantic capabilities of large language models (LLMs) into a declarative query language via user-defined functions. By introducing Hybrid Relational Algebra (HRA), SEMA-SQL unifies relational operations with semantic operators, enabling automatic translation of natural language into efficient hybrid queries. The system leverages in-context learning for query generation, cost-based optimization, and intelligent batching of LLM invocations, significantly enhancing both expressiveness and execution efficiency on standard and extended benchmarks. Notably, it reduces average LLM calls for semantic join operations by 93%.

Technology Category

Application Category

📝 Abstract

Relational databases excel at structured data analysis, but real-world queries increasingly require capabilities beyond standard SQL, such as semantically matching entities across inconsistent names, extracting information not explicitly stored in schemas, and analyzing unstructured text. While text-to-SQL systems enable natural language querying, they remain limited to relational operations and cannot leverage the semantic reasoning capabilities of modern large language models (LLMs). Conversely, recent semantic operator systems extend relational algebra with LLM-powered operations (e.g., semantic joins, mappings, aggregations), but require users to manually construct complex query pipelines. To address this gap, we present SEMA-SQL, a system that automatically answers natural language questions by generating efficient queries that combine relational operations with LLM semantic reasoning. We formalize Hybrid Relational Algebra (HRA), a declarative abstraction unifying traditional relational operators with LLM user-defined functions (UDFs). The system automates three critical aspects: (1) query generation via in-context learning that produces HRA queries with precise natural language specifications for LLM UDFs, (2) query optimization via cost-based transformations and UDF rewriting, and (3) efficient execution algorithms that reduce LLM invocations by an average of 93% in semantic joins through intelligent batching. Extensive experiments with known benchmarks, and extensions thereof, demonstrate the significant query capability improvements possible with our design.

Problem

Research questions and friction points this paper is trying to address.

text-to-SQL

semantic reasoning

large language models

relational databases

query generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Relational Algebra

LLM-powered UDFs

semantic joins