Robust Detection of Synthetic Tabular Data under Schema Variability

πŸ“… 2025-08-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenging problem of detecting synthetic tabular data in real-world scenarios where table structures are variable and schemas are unknown. We propose the first detection framework specifically designed for variable-schema tabular data. Methodologically, we introduce a datum-wise Transformer architecture augmented with a table-structure adaptive module, enabling dynamic modeling of heterogeneous and previously unseen table structures. Departing from conventional fixed-schema assumptions, our approach incorporates a lightweight table adaptation mechanism to enhance generalization and robustness. Extensive experiments on diverse, multi-source tabular datasets demonstrate that our method achieves AUC and accuracy improvements of 14 percentage points over state-of-the-art baselines. These results significantly improve detection reliability on unseen structures and empirically validate the feasibility of high-generalization synthetic-data detection for variable-schema tables.

Technology Category

Application Category

πŸ“ Abstract
The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data in the wild, where tables have variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is not only feasible, but can be done with high reliability.
Problem

Research questions and friction points this paper is trying to address.

Detecting synthetic tabular data with variable schemas
Addressing unseen table formats in real-world conditions
Improving robustness and accuracy in synthetic data detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel datum-wise transformer architecture
Incorporates table-adaptation component
Detects synthetic tabular data robustly
πŸ”Ž Similar Papers
No similar papers found.