🤖 AI Summary
Heterogeneous, high-dimensional, and sparse tabular customer data impede effective entity representation learning. Method: This paper proposes DEEPCAE—a multi-layer contractive autoencoder framework for generic entity embedding. Its core innovation lies in (i) the first formalization of a unified optimization objective for generic entity embedding, and (ii) the design of a gradient-sensitive regularization term coupled with hierarchical feature constraints, enabling end-to-end joint optimization of reconstruction fidelity and downstream predictive performance. Results: Extensive experiments across 13 real-world customer datasets demonstrate that DEEPCAE reduces reconstruction error by 34% compared to stacked CAEs and consistently outperforms state-of-the-art autoencoder variants on diverse downstream tasks—including classification and regression—thereby validating its effectiveness and generalizability.
📝 Abstract
Recent advances in representation learning have successfully leveraged the underlying domain-specific structure of data across various fields. However, representing diverse and complex entities stored in tabular format within a latent space remains challenging. In this paper, we introduce DEEPCAE, a novel method for calculating the regularization term for multi-layer contractive autoencoders (CAEs). Additionally, we formalize a general-purpose entity embedding framework and use it to empirically show that DEEPCAE outperforms all other tested autoencoder variants in both reconstruction performance and downstream prediction performance. Notably, when compared to a stacked CAE across 13 datasets, DEEPCAE achieves a 34% improvement in reconstruction error.