MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based methods for tabular missing data imputation—built upon stochastic Denoising Diffusion Probabilistic Models (DDPMs)—suffer from high inference latency and output instability. To address these limitations, this paper proposes the first deterministic conditional diffusion framework tailored for tabular data, introducing the Denoising Diffusion Implicit Models (DDIM) sampling scheme to this task. Our method employs a conditional diffusion architecture, conditioning the generation process on observed features, and leverages DDIM’s deterministic reverse process to reconstruct missing values without sampling variance—drastically reducing the number of denoising steps required. Evaluated on multiple real-world tabular datasets, our approach achieves up to an order-of-magnitude speedup in inference time, delivers highly consistent outputs across runs, and maintains imputation accuracy comparable to stochastic DDPMs. Key contributions include: (i) the first adaptation of DDIM to tabular missing data imputation; (ii) efficient, deterministic, and conditionally controllable generation; and (iii) a novel paradigm for practical deployment of diffusion models on structured data.

Technology Category

Application Category

📝 Abstract
Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion probabilistic models (DDPMs), suffer from high inference latency and variable outputs, limiting their applicability in real-world tabular settings. To address these deficiencies, we present in this paper MissDDIM, a conditional diffusion framework that adapts Denoising Diffusion Implicit Models (DDIM) for tabular imputation. While stochastic sampling enables diverse completions, it also introduces output variability that complicates downstream processing.
Problem

Research questions and friction points this paper is trying to address.

High inference latency in tabular data imputation
Variable outputs from stochastic diffusion models
Need for deterministic efficient conditional diffusion framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic conditional diffusion for tabular data
Adapts Denoising Diffusion Implicit Models (DDIM)
Reduces inference latency and output variability