🤖 AI Summary
This work proposes a privacy-preserving synthetic data generation method based on fully homomorphic encryption (FHE) to address the challenges of data silos and privacy protection. It presents the first adaptation of the AIM algorithm to the FHE setting, enabling direct training of marginal-based generative models on encrypted tabular data without decryption at any stage. The approach integrates differential privacy to ensure rigorous output security. By designing a novel FHE protocol, the method substantially improves computational efficiency, achieving practical runtime performance while maintaining synthetic data quality comparable to that of the original AIM algorithm. This demonstrates the feasibility of generating high-fidelity, highly private synthetic data in real-world applications.
📝 Abstract
Data is the lifeblood of AI, yet much of the most valuable data remains locked in silos due to privacy and regulations. As a result, AI remains heavily underutilized in many of the most important domains, including healthcare, education, and finance. Synthetic data generation (SDG), i.e. the generation of artificial data with a synthesizer trained on real data, offers an appealing solution to make data available while mitigating privacy concerns, however existing SDG-as-a-service workflow require data holders to trust providers with access to private data. We propose FHAIM, the first fully homomorphic encryption (FHE) framework for training a marginal-based synthetic data generator on encrypted tabular data. FHAIM adapts the widely used AIM algorithm to the FHE setting using novel FHE protocols, ensuring that the private data remains encrypted throughout and is released only with differential privacy guarantees. Our empirical analysis show that FHAIM preserves the performance of AIM while maintaining feasible runtimes.