🤖 AI Summary
Existing privacy-preserving SQL query sanitization systems exhibit significant heterogeneity in design paradigms, privacy models, protection granularities, and system architectures, hindering systematic comparison and informed adoption. Method: This paper proposes the first multidimensional taxonomy for analytical queries—spanning privacy models, protection units, and system architectures—and conducts a systematic empirical evaluation of 12 representative systems grounded in mainstream mechanisms such as k-anonymity and differential privacy. Contribution/Results: Leveraging a unified benchmark, we quantitatively characterize, for the first time, the fundamental trade-offs among data utility, query latency, and privacy guarantees. Our analysis precisely delineates performance boundaries and identifies optimal deployment scenarios for each approach. These findings provide both theoretical foundations and practical guidelines for selecting, designing, and standardizing privacy-enhancing database systems.
📝 Abstract
Analytical SQL queries are essential for extracting insights from relational databases but concurrently introduce significant privacy risks by potentially exposing sensitive information. To mitigate these risks, numerous query sanitization systems have been developed, employing diverse approaches that create a complex landscape for both researchers and practitioners. These systems vary fundamentally in their design, including the underlying privacy model, such as k-anonymity or Differential Privacy; the protected privacy unit, whether at the tuple- or user-level; and the software architecture, which can be proxy-based or integrated. This paper provides a systematic classification of state-of-the-art SQL sanitization systems based on these qualitative criteria and the scope of queries they support. Furthermore, we present a quantitative analysis of leading systems, empirically measuring the trade-offs between data utility, query execution overhead, and privacy guarantees across a range of analytical queries. This work offers a structured overview and performance assessment intended to clarify the capabilities and limitations of current privacy-preserving database technologies.