Exponential Family Variational Flow Matching for Tabular Data Generation

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing denoising diffusion and flow matching methods struggle to jointly satisfy differentiability, efficiency, and distribution fidelity when modeling tabular data with mixed continuous and discrete features. Method: We propose Exponential Family Variational Flow Matching (EF-VFM), the first flow matching framework generalized to heterogeneous tabular data. EF-VFM establishes a theoretical connection to Bregman divergence–driven generalized flow matching and introduces a unified exponential family distribution modeling scheme for end-to-end joint generation of mixed-type features. Its data-driven moment-matching objective enables efficient, fully differentiable optimization. Results: On multiple standard tabular benchmarks, EF-VFM achieves state-of-the-art performance—outperforming existing diffusion and flow matching baselines across generation quality, training efficiency, and fidelity of both marginal and joint distributions—demonstrating balanced improvements across all three dimensions.

Technology Category

Application Category

📝 Abstract
While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.
Problem

Research questions and friction points this paper is trying to address.

Extending flow matching to tabular data generation
Handling mixed continuous and discrete features
Improving generative modeling for heterogeneous data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exponential Family Variational Flow Matching
Moment matching for mixed data types
Bregman divergences connect variational objectives