TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling

📅 2024-10-31

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 2

🤖 AI Summary

To address the limited performance of conventional MLPs and the low efficiency of deep ensemble methods in supervised learning on tabular data, this paper proposes TabM: a parameter-efficient, implicitly ensembled MLP architecture. Its core innovation internalizes deep ensemble principles into a single model—leveraging a multi-head output structure with parameter sharing to generate multiple predictions in one forward pass, coupled with an implicit ensemble distillation strategy for joint optimization. TabM adopts a standardized tabular preprocessing pipeline and is end-to-end trainable. On major public benchmarks, TabM achieves state-of-the-art (SOTA) performance, outperforming both Transformer-based and retrieval-augmented models. Moreover, it accelerates inference by 3–5× and reduces parameter count by over 60%, substantially redefining the performance–efficiency trade-off paradigm in deep tabular learning.

Technology Category

Application Category

📝 Abstract

Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for designing substantially better MLP-based tabular architectures. Namely, our new model TabM relies on efficient ensembling, where one TabM efficiently imitates an ensemble of MLPs and produces multiple predictions per object. Compared to a traditional deep ensemble, in TabM, the underlying implicit MLPs are trained simultaneously, and (by default) share most of their parameters, which results in significantly better performance and efficiency. Using TabM as a new baseline, we perform a large-scale evaluation of tabular DL architectures on public benchmarks in terms of both task performance and efficiency, which renders the landscape of tabular DL in a new light. Generally, we show that MLPs, including TabM, form a line of stronger and more practical models compared to attention- and retrieval-based architectures. In particular, we find that TabM demonstrates the best performance among tabular DL models. Then, we conduct an empirical analysis on the ensemble-like nature of TabM. We observe that the multiple predictions of TabM are weak individually, but powerful collectively. Overall, our work brings an impactful technique to tabular DL and advances the performance-efficiency trade-off with TabM -- a simple and powerful baseline for researchers and practitioners.

Problem

Research questions and friction points this paper is trying to address.

Enhances tabular deep learning efficiency

Introduces parameter-efficient ensembling in MLPs

Evaluates performance of new TabM model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient MLP ensembling technique

Parameter sharing in implicit MLPs

Superior tabular DL performance

🔎 Similar Papers

Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later

2024-07-03Citations: 5

A Closer Look at Deep Learning Methods on Tabular Datasets

2024-07-01Citations: 4

Authors to Follow