VT-GAN: Cooperative Tabular Data Synthesis using Vertical Federated Learning

📅 2023-02-03

📈 Citations: 5

✨ Influential: 1

career value

222K/year

🤖 AI Summary

For privacy-sensitive scenarios where multiple parties hold vertically partitioned features, this paper proposes VT-GAN—the first generative adversarial network (GAN) framework tailored for vertical federated learning (VFL) to synthesize high-fidelity tabular data. Methodologically, it introduces a distributed GAN architecture with a feature-shuffling mechanism during training to prevent inversion of raw data from conditional vectors, integrated with differential privacy for enhanced privacy guarantees. Contributions include: (1) achieving generation quality on par with centralized GANs under strict privacy constraints; (2) incurring only a 2.7% performance drop on downstream tasks using synthetic data; (3) demonstrating strong robustness under highly imbalanced data distributions; and (4) validating scalability and resilience against membership inference attacks (MIAs) across multiple real-world datasets.

📝 Abstract

This paper presents the application of Vertical Federated Learning (VFL) to generate synthetic tabular data using Generative Adversarial Networks (GANs). VFL is a collaborative approach to train machine learning models among distinct tabular data holders, such as financial institutions, who possess disjoint features for the same group of customers. In this paper we introduce the VT-GAN framework, Vertical federated Tabular GAN, and demonstrate that VFL can be successfully used to implement GANs for distributed tabular data in privacy-preserving manner, with performance close to centralized GANs that assume shared data. We make design choices with respect to the distribution of GAN generator and discriminator models and introduce a training-with-shuffling technique so that no party can reconstruct training data from the GAN conditional vector. The paper presents (1) an implementation of VT-GAN, (2) a detailed quality evaluation of the VT-GAN-generated synthetic data, (3) an overall scalability examination of VT-GAN framework, (4) a security analysis on VT-GAN's robustness against Membership Inference Attack with different settings of Differential Privacy, for a range of datasets with diverse distribution characteristics. Our results demonstrate that VT-GAN can consistently generate high-fidelity synthetic tabular data of comparable quality to that generated by a centralized GAN algorithm. The difference in machine learning utility can be as low as 2.7%, even under extremely imbalanced data distributions across clients or with different numbers of clients.

Problem

Research questions and friction points this paper is trying to address.

Generates synthetic tabular data using GANs.

Applies Vertical Federated Learning for privacy.

Ensures data quality and security robustness.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vertical Federated Learning applied

GANs for tabular data synthesis

Privacy-preserving training-with-shuffling technique

🔎 Similar Papers

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data