🤖 AI Summary
This work addresses the critical gap in evaluating user-level privacy risks in multi-table synthetic data, as existing membership inference attacks are largely limited to single-table settings. The paper presents the first formal definition of multi-table user-level membership inference attacks and introduces a novel No-Box attack framework based on heterogeneous graph neural networks. By explicitly modeling cross-table user-entity relationships, the framework effectively integrates multi-table relational information to infer whether a specific user participated in the training data. Extensive experiments on multiple real-world multi-table datasets demonstrate that state-of-the-art synthetic data generators exhibit significant user-level privacy vulnerabilities. Moreover, the proposed method accurately identifies the sources of leakage, revealing that single-table attacks substantially underestimate the true privacy risks inherent in multi-table scenarios.
📝 Abstract
Synthetic tabular data has gained attention for enabling privacy-preserving data sharing. While substantial progress has been made in single-table synthetic generation where data are modeled at the row or item level, most real-world data exists in relational databases where a user's information spans items across multiple interconnected tables. Recent advances in synthetic relational data generation have emerged to address this complexity, yet release of these data introduce unique privacy challenges as information can be leaked not only from individual items but also through the relationships that comprise a complete user entity. To address this, we propose a novel Membership Inference Attack (MIA) setting to audit the empirical user-level privacy of synthetic relational data and show that single-table MIAs that audit at an item level underestimate user-level privacy leakage. We then propose Multi-Table Membership Inference Attack (MT-MIA), a novel adversarial attack under a No-Box threat model that targets learned representations of user entities via Heterogeneous Graph Neural Networks. By incorporating all connected items for a user, MT-MIA better targets user-level vulnerabilities induced by inter-tabular relationships than existing attacks. We evaluate MT-MIA on a range of real-world multi-table datasets and demonstrate that this vulnerability exists in state-of-the-art relational synthetic data generators, employing MT-MIA to additionally study where this leakage occurs.