PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees (Technical Report)

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing Approximate Query Processing (AQP) techniques struggle to simultaneously guarantee bounded error, incur zero maintenance overhead, and remain transparent to the underlying DBMS—hindering industrial adoption. This paper proposes TAQA, the first database-agnostic, online AQP framework supporting arbitrary SQL queries. TAQA employs a two-stage adaptive query estimation mechanism coupled with BSAP—a block-level statistical sampling technique—to rigorously enforce user-specified absolute error bounds (e.g., ≤5%) without modifying any DBMS or requiring offline maintenance. Implemented as a lightweight middleware (PilotDB), TAQA provides a unified cross-DB interface for PostgreSQL, SQL Server, and DuckDB. Evaluated on real-world benchmarks, it achieves up to 126× speedup over exact query execution. TAQA is the first solution to jointly satisfy the three critical industrial requirements: verifiable error guarantees, zero maintenance, and full DBMS transparency.

Technology Category

Application Category

📝 Abstract

After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We simple ment TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5% guaranteed error.

Problem

Research questions and friction points this paper is trying to address.

Provide user-specified error guarantees in AQP

Eliminate maintenance overheads in approximate query processing

Avoid modifications to database management systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage online AQP algorithm TAQA

Block-level sampling with BSAP

DBMS-agnostic middleware PilotDB

🔎 Similar Papers

Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model