TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing

📅 2024-06-23

🏛️ arXiv.org

📈 Citations: 15

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the lack of a unified modeling framework for heterogeneous temporal tabular data—comprising continuous, binary, and categorical variables—across four key tasks: unconditional generation, missing value imputation, forecasting, and conditional generation based on time-varying metadata. We propose a novel hybrid generative framework integrating Variational Autoencoders (VAEs) and Denoising Diffusion Probabilistic Models (DDPMs). Our approach introduces joint latent-space modeling and a batched multi-sequence generation architecture, enabling entity-level conditional control while overcoming the sequential point-wise sampling bottleneck inherent in standard diffusion models. Evaluated on six public benchmarks, the method achieves state-of-the-art performance across four core metrics—fidelity (e.g., MSE, KL divergence) and utility (e.g., downstream task F1 score)—with sampling speed accelerated by an order of magnitude. To our knowledge, this is the first framework achieving high fidelity, high efficiency, and strong conditional controllability for general-purpose generation of multi-sequence heterogeneous temporal tabular data.

Technology Category

Application Category

📝 Abstract

In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion probabilistic model (DDPM). Our model named as exttt{TimeAutoDiff} has several key advantages including (1) Generality: the ability to handle the broad spectrum of time series tabular data from single to multi-sequence datasets; (2) Good fidelity and utility guarantees: numerical experiments on six publicly available datasets demonstrating significant improvements over state-of-the-art models in generating time series tabular data, across four metrics measuring fidelity and utility; (3) Fast sampling speed: entire time series data generation as opposed to the sequential data sampling schemes implemented in the existing diffusion-based models, eventually leading to significant improvements in sampling speed, (4) Entity conditional generation: the first implementation of conditional generation of multi-sequence time series tabular data with heterogenous features in the literature, enabling scenario exploration across multiple scientific and engineering domains. Codes are in preparation for release to the public, but available upon request.

Problem

Research questions and friction points this paper is trying to address.

Unifies generation, imputation, forecasting, and metadata conditioning for heterogeneous time series.

Handles mixed data types (continuous, binary, categorical) in tabular time series.

Improves scalability and speed via latent diffusion and feature compression.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified latent-diffusion framework for multiple time-series tasks

Masked-modeling strategy with binary mask for observed and generated cells

VAE compresses features, diffusion model samples entire latent trajectory at once

🔎 Similar Papers

No similar papers found.