Towards Benchmarking Foundation Models for Tabular Data With Text

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
Existing tabular benchmarks largely lack realistic textual columns, hindering the evaluation of foundation models for text-rich tables. To address this, we propose the first systematic benchmarking framework explicitly supporting textual fields: (1) We manually curate and construct a real-world tabular dataset featuring semantically rich textual columns; (2) We design a text-aware ablation paradigm that integrates textual features—via word embeddings, prompt injection, and other mechanisms—into conventional tabular modeling pipelines; (3) We conduct unified evaluation across multiple state-of-the-art tabular foundation models. Experiments reveal significant performance disparities among mainstream models under text-enhanced settings. Our work delivers a reproducible benchmark, standardized evaluation protocols, and practical fusion strategies—thereby advancing multimodal tabular learning.

Technology Category

Application Category

📝 Abstract
Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features is non-trivial. We propose a series of simple yet effective ablation-style strategies for incorporating text into conventional tabular pipelines. Moreover, we benchmark how state-of-the-art tabular foundation models can handle textual data by manually curating a collection of real-world tabular datasets with meaningful textual features. Our study is an important step towards improving benchmarking of foundation models for tabular data with text.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking foundation models for tabular data with text
Incorporating text into conventional tabular pipelines
Handling textual data in tabular foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ablation-style strategies for text integration
Benchmarking tabular models with text
Curating real-world datasets with text