LLM Embeddings for Deep Learning on Tabular Data

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Existing deep learning methods for tabular data employ separate, high-dimensional embeddings for numerical and categorical features, hindering pre-trained knowledge transfer and cross-table generalization. This work proposes Tab2Text: a paradigm that systematically converts raw tabular data into natural language sequences, leveraging large language models (LLMs) to produce unified, semantically rich embeddings seamlessly integrated into mainstream architectures—including MLPs, ResNets, and FT-Transformers. To our knowledge, this is the first approach to directly incorporate LLM-derived pre-trained representations into end-to-end tabular deep learning, overcoming the limitations of traditional feature-wise encoding. By enabling holistic semantic modeling and plug-and-play enhancement, Tab2Text achieves state-of-the-art performance across seven standard classification benchmarks, consistently outperforming strong baselines such as MLP, ResNet, and FT-Transformer, with significant gains in predictive accuracy.

Technology Category

Application Category

📝 Abstract

Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.

Problem

Research questions and friction points this paper is trying to address.

Embedding tabular data for deep learning

Leveraging LLMs for data encoding

Improving accuracy in classification tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM embeddings for tabular data

Text transformation of tabular data

Plug-and-play deep-learning solution

🔎 Similar Papers

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science