Privacy-Utility Trade-off in Data Publication: A Bilevel Optimization Framework with Curvature-Guided Perturbation

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the fundamental trade-off between privacy preservation and data utility in statistical data publishing, this paper proposes a geometry-guided bilevel optimization framework. At the upper level, it maximizes data utility—measured by generative fidelity and sample diversity—while the lower level quantifies individual vulnerability via manifold-based extrinsic curvature, guiding latent-space perturbations to enhance robustness against membership inference attacks. Crucially, this work is the first to incorporate extrinsic curvature into privacy mechanism design, enabling a discriminator-guided generative defense. Extensive experiments demonstrate that our method significantly improves resistance to membership inference while simultaneously achieving superior sample quality and diversity compared to state-of-the-art baselines. By dynamically balancing privacy guarantees and utility retention through geometric priors, the framework establishes a new paradigm for privacy-preserving generative modeling.

Technology Category

Application Category

📝 Abstract
Machine learning models require datasets for effective training, but directly sharing raw data poses significant privacy risk such as membership inference attacks (MIA). To mitigate the risk, privacy-preserving techniques such as data perturbation, generalization, and synthetic data generation are commonly utilized. However, these methods often degrade data accuracy, specificity, and diversity, limiting the performance of downstream tasks and thus reducing data utility. Therefore, striking an optimal balance between privacy preservation and data utility remains a critical challenge. To address this issue, we introduce a novel bilevel optimization framework for the publication of private datasets, where the upper-level task focuses on data utility and the lower-level task focuses on data privacy. In the upper-level task, a discriminator guides the generation process to ensure that perturbed latent variables are mapped to high-quality samples, maintaining fidelity for downstream tasks. In the lower-level task, our framework employs local extrinsic curvature on the data manifold as a quantitative measure of individual vulnerability to MIA, providing a geometric foundation for targeted privacy protection. By perturbing samples toward low-curvature regions, our method effectively suppresses distinctive feature combinations that are vulnerable to MIA. Through alternating optimization of both objectives, we achieve a synergistic balance between privacy and utility. Extensive experimental evaluations demonstrate that our method not only enhances resistance to MIA in downstream tasks but also surpasses existing methods in terms of sample quality and diversity.
Problem

Research questions and friction points this paper is trying to address.

Optimizing privacy-utility trade-off in data publication
Mitigating membership inference attacks via curvature-guided perturbation
Maintaining data fidelity while enhancing privacy protection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilevel optimization framework balancing privacy and utility
Curvature-guided perturbation targeting vulnerable data features
Alternating optimization achieving synergistic privacy-utility trade-off
🔎 Similar Papers
No similar papers found.