🤖 AI Summary
Existing differentially private SQL systems struggle to accommodate practical governance requirements such as minimum frequency rules. This work proposes DPSQL+, the first system to seamlessly integrate user-level $(\varepsilon, \delta)$-differential privacy with minimum frequency constraints. DPSQL+ features a modular architecture that supports static query security verification, dynamic privacy budget tracking, and compatibility with multiple database backends. Evaluated on the TPC-H benchmark, DPSQL+ significantly increases the number of executable queries under a fixed privacy budget while maintaining high utility accuracy, thereby effectively balancing privacy guarantees with analytical effectiveness.
📝 Abstract
SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,δ)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.