๐ค AI Summary
This study addresses the challenge of balancing privacy preservation and data utility in synthetic data generation for highly regulated financial tabular datasets characterized by severe class imbalance and mixed data types. The authors systematically evaluate the privacyโutility trade-offs of autoencoders, generative adversarial networks (GANs), diffusion models, and Copula-based methods. To better accommodate the unique characteristics of financial data, they propose novel privacy-enhanced GAN and autoencoder architectures. Their experiments provide the first empirical comparison of these generative models in such settings, revealing significant performance differences and offering both methodological innovations and empirical evidence to support the co-optimization of privacy and utility in synthetic financial data generation under stringent regulatory constraints.
๐ Abstract
We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial datasets, a domain characterized by high regulatory risk and severe class imbalance. We consider representative tabular data generators, including autoencoders, generative adversarial networks, diffusion, and copula synthesizers. To address the challenges of the financial domain, we provide novel privacy-preserving implementations of GAN and autoencoder synthesizers. We evaluate whether and how well the generators simultaneously achieve data quality, downstream utility, and privacy, with comparison across balanced and imbalanced input datasets. Our results offer insight into the distinct challenges of generating synthetic data from datasets that exhibit severe class imbalance and mixed-type attributes.