π€ AI Summary
This work addresses the problem of generating theoretically optimal synthetic data for k-smooth queries over the hypercube under (Ξ΅, Ξ΄)-differential privacy. By extending the Chebyshev moment matching framework and integrating function approximation under high-order derivative constraints with privacy mechanism design, we propose a polynomial-time algorithm that applies to all k-smooth queries. We establish, for the first time, the minimax lower bound in this setting, revealing a phase transition in the error rate at k = d. Our method achieves an error rate of n^{βmin{1, k/d}} (up to logarithmic factors), which strictly improves upon existing approaches and represents a significant advance in both utility and theoretical optimality.
π Abstract
Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,\delta)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $n^{-\min \{1, \frac{k}{d}\}}$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in (Wang et al., 2016). Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,\delta)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).