🤖 AI Summary
To address challenges in tabular data—including missing values, noise, heterogeneous structures, variable column counts, and absence of metadata—this paper proposes Basis Transformers, a novel architecture for multi-task tabular regression. Methodologically, it introduces, for the first time, a basis decomposition mechanism into tabular Transformers, explicitly modeling column-name semantics and value-level scale invariance without requiring pretraining or metadata. The design integrates column-aware basis vector decomposition, a lightweight attention module, and task-shared parametric representations. Evaluated on 34 regression tasks from OpenML-CTR23, Basis Transformers achieves a median R² improvement of 0.338 over strong baselines, attains the lowest standard deviation across tasks, and uses only one-fifth the parameters of the best-performing baseline. Notably, with random initialization alone, it surpasses fine-tuned large language models—demonstrating superior parameter efficiency, robustness, and generalization without reliance on external knowledge or costly adaptation.
📝 Abstract
Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, extit{basis transformers}, specifically designed to tackle these challenges while respecting inherent invariances in tabular data, including hierarchical structure and the representation of numeric values. We evaluate our design on a multi-task tabular regression benchmark, achieving an improvement of 0.338 in the median $R^2$ score and the lowest standard deviation across 34 tasks from the OpenML-CTR23 benchmark. Furthermore, our model has five times fewer parameters than the best-performing baseline and surpasses pretrained large language model baselines -- even when initialized from randomized weights.