The Stretto Execution Engine for LLM-Augmented Data Systems

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the fundamental trade-off between efficiency and accuracy in large language model (LLM)-enhanced data systems for semantic querying. The authors propose Stretto, a novel execution engine that formulates query planning as a constrained optimization problem, jointly selecting operator implementations and allocating error budgets to provide end-to-end quality guarantees. Its key innovations include a new class of physical operators based on KV caching, which maps the discrete space of operators onto a continuous spectrum of efficiency–accuracy trade-offs, and a gradient-based optimizer that enables global query planning with fine-grained operator selection and error budget allocation. Experimental results demonstrate that Stretto significantly outperforms state-of-the-art systems in performance while strictly adhering to query quality constraints.

Technology Category

Application Category

📝 Abstract

LLM-augmented data systems enable semantic querying over structured and unstructured data, but executing queries with LLM-powered operators introduces a fundamental runtime-accuracy trade-off. In this paper, we present Stretto, a new execution engine that provides end-to-end query guarantees while efficiently navigating this trade-off in a holistic manner. For this, Stretto formulates query planning as a constrained optimization problem and uses a gradient-based optimizer to jointly select operator implementations and allocate error budgets across pipelines. Moreover, to enable fine-grained execution choices, Stretto introduces a novel idea on how KV-caching can be used to realize a spectrum of different physical operators that transform a sparse design space into a dense continuum of runtime-accuracy trade-offs. Experiments show that Stretto outperforms state-of-the-art systems while consistently meeting quality guarantees.

Problem

Research questions and friction points this paper is trying to address.

LLM-augmented data systems

runtime--accuracy trade-off

query execution

semantic querying

execution engine

Innovation

Methods, ideas, or system contributions that make the work stand out.

constrained optimization

KV-caching

runtime--accuracy trade-off