LLM Embeddings for Deep Learning on Tabular Data

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deep learning methods for tabular data employ separate, high-dimensional embeddings for numerical and categorical features, hindering pre-trained knowledge transfer and cross-table generalization. This work proposes Tab2Text: a paradigm that systematically converts raw tabular data into natural language sequences, leveraging large language models (LLMs) to produce unified, semantically rich embeddings seamlessly integrated into mainstream architectures—including MLPs, ResNets, and FT-Transformers. To our knowledge, this is the first approach to directly incorporate LLM-derived pre-trained representations into end-to-end tabular deep learning, overcoming the limitations of traditional feature-wise encoding. By enabling holistic semantic modeling and plug-and-play enhancement, Tab2Text achieves state-of-the-art performance across seven standard classification benchmarks, consistently outperforming strong baselines such as MLP, ResNet, and FT-Transformer, with significant gains in predictive accuracy.

Technology Category

Application Category

📝 Abstract
Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.
Problem

Research questions and friction points this paper is trying to address.

Embedding tabular data for deep learning
Leveraging LLMs for data encoding
Improving accuracy in classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM embeddings for tabular data
Text transformation of tabular data
Plug-and-play deep-learning solution