🤖 AI Summary
This paper addresses the limited utility and robustness of synthetic tabular data generation under differential privacy (DP). We introduce dpmm, an open-source library that unifies three major classes of DP marginal models—PrivBayes, MST, and AIM—under a rigorously end-to-end DP-compliant framework, while rectifying previously overlooked privacy vulnerabilities. By optimizing privacy budget allocation, calibrated noise injection, and model ensembling, dpmm significantly improves data utility: downstream machine learning tasks achieve 12–28% higher accuracy across multiple benchmark datasets compared to state-of-the-art DP synthesis tools. Designed for industrial deployment, dpmm supports fine-grained configuration, one-command installation, and scalable execution, thereby bridging the gap in production-ready, verifiable, and extensible DP synthetic data libraries. The implementation is publicly available and has been widely adopted by the research and practitioner communities.
📝 Abstract
We propose dpmm, an open-source library for synthetic data generation with Differentially Private (DP) guarantees. It includes three popular marginal models -- PrivBayes, MST, and AIM -- that achieve superior utility and offer richer functionality compared to alternative implementations. Additionally, we adopt best practices to provide end-to-end DP guarantees and address well-known DP-related vulnerabilities. Our goal is to accommodate a wide audience with easy-to-install, highly customizable, and robust model implementations. Our codebase is available from https://github.com/sassoftware/dpmm.