🤖 AI Summary
This work addresses the performance gap between synthetic tabular data generated by diffusion models and real data in downstream tasks—commonly referred to as the synthetic-to-real gap—by introducing TARDIS, a framework that optimizes the outputs of a frozen pre-trained diffusion model during inference without requiring retraining. TARDIS pioneers inference-time optimization for tabular synthesis, formalizing and employing bidirectional Chamfer refinement (BCR) as its core strategy, complemented by score-based guidance via tree-structured Parzen estimators, post-hoc sample selection, and optional soft-label distillation. Evaluated across 15 benchmark tasks, TARDIS achieves a median downstream performance surpassing that of real data by 8.6% and outperforms the base TabDiff model by 12.9% on average, while preserving high fidelity, diversity, and privacy, and running efficiently within 1–80 minutes on a single consumer-grade GPU.
📝 Abstract
Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training time, via architectural advances, scaling, and retraining of monolithic generators. The inference-time alternative, i.e., refining the outputs of a pre-trained backbone with parameters left untouched, has remained largely unexplored for tabular synthesis. We introduce TARDIS (Tabular generation through Refinement, Distillation, and Inference-time Sampling), an inference-time refinement framework that operates on a frozen pre-trained backbone, configured per dataset by a Tree-structured Parzen Estimator search over score-level guidance during reverse diffusion, with each trial's objective set by an inner grid search over post-hoc sample selectors and an optional soft-label distillation step. The search space encodes a single mathematical pattern we name Bidirectional Chamfer Refinement (BCR): the symmetric Chamfer functional between synthetic and real samples is minimized both continuously, via a score-level gradient, and discretely, via batch-ranking post-generation. The per-dataset search recovers BCR-aligned configurations on most datasets, evidence for BCR as the dominant refinement pattern. Across 15 binary, multiclass, and regression benchmarks TARDIS achieves a median +8.6% downstream-task improvement over models trained on real data (95% CI [+3.3, +16.4], Wilcoxon p=0.016, 11/15 strict wins) and improves over the TabDiff backbone on all 15 datasets (mean +12.9%, p<10^-4), matching the backbone on manifold fidelity, diversity, and sample-level privacy. Inference-time refinement of a pre-trained tabular diffusion backbone reaches and exceeds real-data utility in 1 to 80 minutes on a single consumer-grade GPU.