Towards Benchmarking Foundation Models for Tabular Data With Text

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

Existing tabular benchmarks largely lack realistic textual columns, hindering the evaluation of foundation models for text-rich tables. To address this, we propose the first systematic benchmarking framework explicitly supporting textual fields: (1) We manually curate and construct a real-world tabular dataset featuring semantically rich textual columns; (2) We design a text-aware ablation paradigm that integrates textual features—via word embeddings, prompt injection, and other mechanisms—into conventional tabular modeling pipelines; (3) We conduct unified evaluation across multiple state-of-the-art tabular foundation models. Experiments reveal significant performance disparities among mainstream models under text-enhanced settings. Our work delivers a reproducible benchmark, standardized evaluation protocols, and practical fusion strategies—thereby advancing multimodal tabular learning.

Technology Category

Application Category

📝 Abstract

Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features is non-trivial. We propose a series of simple yet effective ablation-style strategies for incorporating text into conventional tabular pipelines. Moreover, we benchmark how state-of-the-art tabular foundation models can handle textual data by manually curating a collection of real-world tabular datasets with meaningful textual features. Our study is an important step towards improving benchmarking of foundation models for tabular data with text.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking foundation models for tabular data with text

Incorporating text into conventional tabular pipelines

Handling textual data in tabular foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ablation-style strategies for text integration

Benchmarking tabular models with text

Curating real-world datasets with text

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering