SIMD-PAC-DB: Pretty Performant PAC Privacy

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Traditional PAC-DB privacy mechanisms rely on repeated random sampling queries, resulting in poor efficiency and limited practicality. This work proposes a novel paradigm that leverages individual bits of primary-key hashes as subsample membership identifiers, enabling privacy-preserving aggregation within a single query. Performance is further enhanced through SIMD parallelization, hash-encoded subsampling, SQL rewriting, and DuckDB extensions. For the first time, this approach replaces 128 independent randomized executions with a single query. Evaluated across thousands of queries on TPC-H, ClickBench, and SQLStorm benchmarks, the method achieves up to a 40× speedup, substantially improving the efficiency, practicality, and deployability of private database systems.

Technology Category

Application Category

📝 Abstract

This work presents a highly optimized implementation of PAC-DB, a recent and promising database privacy model. We prove that our SIMD-PAC-DB can compute the same privatized answer with just a single query, instead of the 128 stochastic executions against different 50% database sub-samples needed by the original PAC-DB. Our key insight is that every bit of a hashed primary key can be seen to represent membership of such a sub-sample. We present new algorithms for approximate computation of stochastic aggregates based on these hashes, which, thanks to their SIMD-friendliness, run up to 40x faster than scalar equivalents. We release an open-source DuckDB community extension which includes a rewriter that PAC-privatizes arbitrary SQL queries. Our experiments on TPC-H, Clickbench, and SQLStorm evaluate thousands of queries in terms of performance and utility, significantly advancing the ease of use and functionality of privacy-aware data systems in practice.

Problem

Research questions and friction points this paper is trying to address.

PAC-DB

database privacy

SIMD optimization

privacy-preserving query

stochastic aggregation

Innovation

Methods, ideas, or system contributions that make the work stand out.

SIMD

PAC-DB

privacy-preserving databases