Winning the MIDST Challenge: New Membership Inference Attacks on Diffusion Models for Tabular Data Synthesis

📅 2025-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based synthetic tabular data generators lack rigorous privacy evaluation; mainstream approaches rely solely on heuristic metrics or weak membership inference attacks (MIAs), substantially underestimating real-world privacy risks. Method: We systematically demonstrate that state-of-the-art MIAs—designed for images—consistently fail in tabular settings, identifying noise initialization as a critical factor undermining attack efficacy. To address this, we propose the first machine learning–driven MIA framework tailored to diffusion-based tabular synthesis, leveraging multi-noise-level and time-step–dependent loss features. Our approach replaces manual hyperparameter tuning with a lightweight MLP that automatically learns discriminative membership signals. Contribution/Results: Our method achieved first place across all tracks of the SaTML 2025 MIDST Challenge, significantly outperforming baselines and providing compelling empirical evidence of substantial privacy leakage in diffusion-generated tabular data.

Technology Category

Application Category

📝 Abstract
Tabular data synthesis using diffusion models has gained significant attention for its potential to balance data utility and privacy. However, existing privacy evaluations often rely on heuristic metrics or weak membership inference attacks (MIA), leaving privacy risks inadequately assessed. In this work, we conduct a rigorous MIA study on diffusion-based tabular synthesis, revealing that state-of-the-art attacks designed for image models fail in this setting. We identify noise initialization as a key factor influencing attack efficacy and propose a machine-learning-driven approach that leverages loss features across different noises and time steps. Our method, implemented with a lightweight MLP, effectively learns membership signals, eliminating the need for manual optimization. Experimental results from the MIDST Challenge @ SaTML 2025 demonstrate the effectiveness of our approach, securing first place across all tracks. Code is available at https://github.com/Nicholas0228/Tartan_Federer_MIDST.
Problem

Research questions and friction points this paper is trying to address.

Assessing privacy risks in diffusion-based tabular data synthesis
Improving membership inference attacks for tabular data models
Developing a machine-learning-driven approach for privacy evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine-learning-driven MIA for tabular data
Leverages loss features across noises and steps
Lightweight MLP eliminates manual optimization
🔎 Similar Papers
No similar papers found.