Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

๐Ÿ“… 2024-04-12
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 7
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing tabular diffusion models often inherit sensitive attribute biases (e.g., gender, race) from training data, leading to unfair synthetic data. To address this, we propose the first fairness-aware tabular diffusion framework. Our method introduces a novel sensitive-attribute guidance mechanism that explicitly balances the joint distribution of target labels and sensitive attributes during sampling via conditional gradient guidance. We further integrate a unified embedding encoder for mixed-type features and a fairness-regularized loss to jointly optimize statistical fidelity and downstream utility. Evaluated on multiple benchmark datasets, our approach achieves average improvements of over 10% in demographic parity ratio and equal opportunity ratioโ€”substantially outperforming state-of-the-art baselines while preserving data quality and model performance.

Technology Category

Application Category

๐Ÿ“ Abstract
Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data, which may influence discriminatory actions. In this research, we introduce a novel tabular diffusion model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes, such as sex and race. The empirical results demonstrate that our method effectively mitigates bias in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data on fairness metrics such as demographic parity ratio and equalized odds ratio, achieving improvements of over $10%$. Our implementation is available at https://github.com/comp-well-org/fair-tab-diffusion.
Problem

Research questions and friction points this paper is trying to address.

Mitigates bias in synthetic tabular data generation
Balances joint distributions of target labels and sensitive attributes
Improves fairness metrics like demographic parity and equalized odds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models for tabular data synthesis
Incorporates sensitive guidance to reduce bias
Improves fairness metrics over existing methods
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zeyu Yang
Department of Electrical and Computer Engineering, Rice University, Houston, USA
P
Peikun Guo
Department of Electrical and Computer Engineering, Rice University, Houston, USA
Khadija Zanna
Khadija Zanna
Rice University
AI SecurityHCIAffective computingDigital Health
A
Akane Sano
Department of Electrical and Computer Engineering, Rice University, Houston, USA