How Well Does Your Tabular Generator Learn the Structure of Tabular Data?

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing evaluation methods inadequately assess table generation models’ ability to learn intrinsic structural properties—particularly causal structure. Method: We propose TabStruct, the first benchmark centered on *structural fidelity*, introducing *causal structure alignment* as a core evaluation criterion and establishing a novel, structure-driven, task-agnostic, and domain-agnostic evaluation paradigm. Leveraging expert-validated causal graphs, we design a reproducible structural distance quantification protocol and systematically evaluate eight classes of generative models across seven real-world datasets. Contribution/Results: Structural fidelity emerges as an independent, critical evaluation dimension orthogonal to downstream task performance. It significantly enhances model diagnostic precision and provides actionable optimization guidance. TabStruct thus establishes a principled, structurally grounded benchmark for rigorous assessment of table generation models.

Technology Category

Application Category

📝 Abstract

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to adapt the successes of generative modelling in homogeneous modalities to the tabular domain, defining an effective generator for tabular data remains an open problem. One major reason is that the evaluation criteria inherited from other modalities often fail to adequately assess whether tabular generative models effectively capture or utilise the unique structural information encoded in tabular data. In this paper, we carefully examine the limitations of the prevailing evaluation framework and introduce $ extbf{TabStruct}$, a novel evaluation benchmark that positions structural fidelity as a core evaluation dimension. Specifically, TabStruct evaluates the alignment of causal structures in real and synthetic data, providing a direct measure of how effectively tabular generative models learn the structure of tabular data. Through extensive experiments using generators from eight categories on seven datasets with expert-validated causal graphical structures, we show that structural fidelity offers a task-independent, domain-agnostic evaluation dimension. Our findings highlight the importance of tabular data structure and offer practical guidance for developing more effective and robust tabular generative models. Code is available at https://github.com/SilenceX12138/TabStruct.

Problem

Research questions and friction points this paper is trying to address.

Challenges in generative modeling for heterogeneous tabular data.

Inadequate evaluation criteria for tabular generative models.

Need for structural fidelity in tabular data generation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces TabStruct benchmark for structural fidelity

Evaluates causal structure alignment in tabular data

Provides domain-agnostic evaluation for generative models

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering