Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly retrieving structured tabular data and unstructured text in knowledge-intensive analytical workloads, this paper proposes a unified execution framework that embeds LLMs and RAG directly within DuckDB. Methodologically, it introduces three key innovations: (1) the first abstraction of PROMPTs and MODELs as first-class SQL DDL schema objects; (2) a cost-aware LLM operator optimization mechanism—supporting batched inference and caching—within a resource-agnostic execution engine; and (3) native support for LLM-based scalar and aggregate functions, seamless RAG integration, and relational-algebra-inspired SQL query optimization with model orchestration. The framework significantly reduces end-to-end development complexity while preserving SQL’s declarative simplicity. It enables tuple-level chained prediction, achieving low-overhead, high-throughput LLM inference without sacrificing expressiveness or efficiency.

Technology Category

Application Category

📝 Abstract
Knowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines. However, implementing these pipelines efficiently still demands significant effort and has several challenges. This often involves orchestrating heterogeneous data systems, managing data movement, and handling low-level implementation details, e.g., LLM context management. To address these challenges, we introduce FlockMTL: an extension for DBMSs that deeply integrates LLM capabilities and retrieval-augmented generation (RAG). FlockMTL includes model-driven scalar and aggregate functions, enabling chained predictions through tuple-level mappings and reductions. Drawing inspiration from the relational model, FlockMTL incorporates: (i) cost-based optimizations, which seamlessly apply techniques such as batching and caching; and (ii) resource independence, enabled through novel SQL DDL abstractions: PROMPT and MODEL, introduced as first-class schema objects alongside TABLE. FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden.
Problem

Research questions and friction points this paper is trying to address.

Integrating LLMs and RAG into DBMS for analytics
Reducing effort in retrieval-reasoning pipeline implementation
Optimizing resource use and cost in data systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep integration of LLMs and RAG into DuckDB
Model-driven scalar and aggregate functions
Cost-based optimizations with SQL DDL abstractions
🔎 Similar Papers
No similar papers found.