A Systematic Framework for Tabular Data Disentanglement

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing disentanglement methods for tabular data suffer from poor scalability, mode collapse, and weak extrapolation capabilities, hindering their ability to effectively model complex inter-attribute dependencies. This work proposes the first systematic disentanglement framework tailored specifically for tabular data, modularizing the process into four components: data extraction, probabilistic modeling, representation analysis, and latent space extrapolation. This design overcomes the limitations of directly adapting disentanglement approaches originally developed for images or text. The proposed architecture is compatible with—and enhances—existing techniques such as VAEs and CT-GANs. Empirical evaluations on synthetic data demonstrate its superior performance in both disentanglement quality and downstream task utility, establishing a new foundation for representation learning in tabular domains.
📝 Abstract
Tabular data, widely used in various applications such as industrial control systems, finance, and supply chain, often contains complex interrelationships among its attributes. Data disentanglement seeks to transform such data into latent variables with reduced interdependencies, facilitating more effective and efficient processing. Despite the extensive studies on data disentanglement over image, text, or audio data, tabular data disentanglement may require further investigation due to the more intricate attribute interactions typically found in tabular data. Moreover, due to the highly complex interrelationships, direct translation from other data domains results in suboptimal data disentanglement. Existing tabular data disentanglement methods, such as factor analysis, CT-GAN, and VAE face limitations including scalability issues, mode collapse, and poor extrapolation. In this paper, we propose the use of a framework to provide a systematic view on tabular data disentanglement that modularizes the process into four core components: data extraction, data modeling, model analysis, and latent representation extrapolation. We believe this work provides a deeper understanding of tabular data disentanglement and existing methods, and lays the foundation for potential future research in developing robust, efficient, and scalable data disentanglement techniques. Finally, we demonstrate the framework's applicability through a case study on synthetic tabular data generation, showcasing its potential in the particular downstream task of data synthesis.
Problem

Research questions and friction points this paper is trying to address.

tabular data
data disentanglement
attribute interactions
scalability
extrapolation
Innovation

Methods, ideas, or system contributions that make the work stand out.

tabular data disentanglement
systematic framework
latent representation
modular architecture
synthetic data generation
🔎 Similar Papers
No similar papers found.
Ivan Tjuawinata
Ivan Tjuawinata
Research Fellow, Nanyang Technological University
Multiparty ComputationPrivacy Preserving SchemeCoding TheoryCryptanalysis
A
Andre Gunawan
Nanyang Technological University, Singapore
A
Anh Quan Tran
Nanyang Technological University, Singapore
N
Nitish Kumar
Mastercard, India
P
Payal Pote
Mastercard, India
H
Harsh Bansal
Mastercard, India
C
Chu-Hung Chi
Nanyang Technological University, Singapore
Kwok-Yan Lam
Kwok-Yan Lam
Nanyang Technological University
CybersecurityPrivacy-Preserving technologiesDigital TrustDistributing systemsLegalTech
P
Parventanis Murthy
Nanyang Technological University, Singapore