🤖 AI Summary
Standard variational autoencoders (VAEs) struggle to effectively model complex inter-feature relationships in tabular data generation, particularly when dealing with mixed data types. This work systematically investigates the impact of integrating Transformer modules into different components of the VAE—namely the encoder, decoder, and latent space—on generation performance. Large-scale experiments across 57 tabular datasets from the OpenML CC18 benchmark demonstrate that incorporating Transformers into the latent representation and decoder substantially improves generation quality, while also revealing a trade-off between architectural placement and the fidelity–diversity balance. Furthermore, the study finds that Transformer layers exhibit high internal similarity and that the decoder displays nearly linear input–output behavior, offering key insights for designing efficient generative models for tabular data.
📝 Abstract
Tabular data remains a challenging domain for generative models. In particular, the standard Variational Autoencoder (VAE) architecture, typically composed of multilayer perceptrons, struggles to model relationships between features, especially when handling mixed data types. In contrast, Transformers, through their attention mechanism, are better suited for capturing complex feature interactions. In this paper, we empirically investigate the impact of integrating Transformers into different components of a VAE. We conduct experiments on 57 datasets from the OpenML CC18 suite and draw two main conclusions. First, results indicate that positioning Transformers to leverage latent and decoder representations leads to a trade-off between fidelity and diversity. Second, we observe a high similarity between consecutive blocks of a Transformer in all components. In particular, in the decoder, the relationship between the input and output of a Transformer is approximately linear.