🤖 AI Summary
Existing Approximate Query Processing (AQP) techniques struggle to simultaneously guarantee bounded error, incur zero maintenance overhead, and remain transparent to the underlying DBMS—hindering industrial adoption. This paper proposes TAQA, the first database-agnostic, online AQP framework supporting arbitrary SQL queries. TAQA employs a two-stage adaptive query estimation mechanism coupled with BSAP—a block-level statistical sampling technique—to rigorously enforce user-specified absolute error bounds (e.g., ≤5%) without modifying any DBMS or requiring offline maintenance. Implemented as a lightweight middleware (PilotDB), TAQA provides a unified cross-DB interface for PostgreSQL, SQL Server, and DuckDB. Evaluated on real-world benchmarks, it achieves up to 126× speedup over exact query execution. TAQA is the first solution to jointly satisfy the three critical industrial requirements: verifiable error guarantees, zero maintenance, and full DBMS transparency.
📝 Abstract
After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We simple ment TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5% guaranteed error.