HeteroFedSyn: Differentially Private Tabular Data Synthesis for Heterogeneous Federated Settings

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of distribution heterogeneity in horizontally federated settings, which often leads to high bias or excessive noise in existing differentially private tabular data synthesis methods. The authors propose a second-order marginal-based synthetic data framework tailored for horizontal federated learning, which innovatively integrates an L2 dependence measure with random projections to efficiently capture feature correlations. They design an unbiased estimator under multiplicative noise and introduce an adaptive marginal selection strategy that dynamically updates dependence scores. Under strict differential privacy guarantees, the method significantly improves synthetic data utility, achieving performance on par with centralized approaches in range queries, Wasserstein distance, and downstream machine learning tasks, thereby effectively balancing privacy preservation and data fidelity.

Technology Category

Application Category

📝 Abstract
Traditional Differential Privacy (DP) mechanisms are typically tailored to specific analysis tasks, which limits the reusability of protected data. DP tabular data synthesis overcomes this by generating synthetic datasets that can be shared for arbitrary downstream tasks. However, existing synthesis methods predominantly assume centralized or local settings and overlook the more practical horizontal federated scenario. Naively synthesizing data locally or perturbing individual records either produces biased mixtures or introduces excessive noise, especially under heterogeneous data distributions across participants. We propose HeteroFedSyn, the first DP tabular data synthesis framework designed specifically for the horizontal federated setting. Built upon the PrivSyn paradigm of 2-way marginal-based synthesis, HeteroFedSyn introduces three key innovations for distributed marginal selection: (i) an L2-based dependency metric with random projection for noise-efficient correlation measurement, (ii) an unbiased estimator to correct multiplicative noise, and (iii) an adaptive selection strategy that dynamically updates dependency scores to avoid redundancy. Extensive experiments on range queries, Wasserstein fidelity, and machine learning tasks show that, despite the increased noise inherent to federated execution, HeteroFedSyn achieves utility comparable to centralized synthesis. Our code is open-sourced via the link.
Problem

Research questions and friction points this paper is trying to address.

Differential Privacy
Tabular Data Synthesis
Federated Learning
Data Heterogeneity
Horizontal Federated Setting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Privacy
Federated Learning
Tabular Data Synthesis
Marginal Selection
Heterogeneous Data
🔎 Similar Papers
No similar papers found.