Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A systematic survey of diffusion models for tabular data generation is lacking. Method: This work presents the first comprehensive review of relevant research from 2015 to 2024, introducing a unified taxonomy covering all major works across this period; constructing an open-source, maintainable GitHub knowledge repository; and analyzing heterogeneous approaches within a consistent mathematical framework. Key technical foci include probabilistic modeling, discrete-continuous hybrid denoising, conditional diffusion architectures, and evaluation benchmarks. Contribution/Results: We synthesize over 30 seminal works, identifying fundamental modeling challenges—namely high-dimensional sparsity, feature heterogeneity, and structural disorder—and delineating empirical performance boundaries. The study establishes a theoretical analysis paradigm for academia and delivers a reusable methodology guide for industry, thereby filling a critical gap in authoritative, domain-specific survey literature.

Technology Category

Application Category

📝 Abstract
In recent years, generative models have achieved remarkable performance across diverse applications, including image generation, text synthesis, audio creation, video generation, and data augmentation. Diffusion models have emerged as superior alternatives to Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) by addressing their limitations, such as training instability, mode collapse, and poor representation of multimodal distributions. This success has spurred widespread research interest. In the domain of tabular data, diffusion models have begun to showcase similar advantages over GANs and VAEs, achieving significant performance breakthroughs and demonstrating their potential for addressing unique challenges in tabular data modeling. However, while domains like images and time series have numerous surveys summarizing advancements in diffusion models, there remains a notable gap in the literature for tabular data. Despite the increasing interest in diffusion models for tabular data, there has been little effort to systematically review and summarize these developments. This lack of a dedicated survey limits a clear understanding of the challenges, progress, and future directions in this critical area. This survey addresses this gap by providing a comprehensive review of diffusion models for tabular data. Covering works from June 2015, when diffusion models emerged, to December 2024, we analyze nearly all relevant studies, with updates maintained in a href{https://github.com/Diffusion-Model-Leiden/awesome-diffusion-models-for-tabular-data}{GitHub repository}. Assuming readers possess foundational knowledge of statistics and diffusion models, we employ mathematical formulations to deliver a rigorous and detailed review, aiming to promote developments in this emerging and exciting area.
Problem

Research questions and friction points this paper is trying to address.

Reviewing diffusion models for tabular data
Addressing challenges and progress in tabular data modeling
Summarizing future directions in diffusion model applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion models for tabular data
Overcomes GAN and VAE limitations
Comprehensive review from 2015 to 2024
🔎 Similar Papers
No similar papers found.
Z
Zhong Li
LIACS, Leiden University, the Netherlands
Q
Qi Huang
LIACS, Leiden University, the Netherlands
L
Lincen Yang
LIACS, Leiden University, the Netherlands
Jiayang Shi
Jiayang Shi
Leiden University
Machine LearningComputed TomographyInverse Problem
Z
Zhao Yang
VU Amsterdam
N
N. V. Stein
LIACS, Leiden University, the Netherlands
T
Thomas Back
LIACS, Leiden University, the Netherlands
M
M. Leeuwen
LIACS, Leiden University, the Netherlands