End-to-End Compression for Tabular Foundation Models

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work proposes TACO, an end-to-end learnable data compression framework for tabular foundation models, addressing the prohibitive quadratic complexity of attention-based architectures that leads to inefficient training and inference as well as excessive memory consumption at scale. TACO compresses the entire training dataset into a compact latent representation and leverages in-context learning to perform predictions in a single forward pass without requiring parameter updates or direct access to the original data. To the best of our knowledge, this is the first approach to achieve end-to-end compression of training data for tabular foundation models. Evaluated on the TabArena benchmark, TACO achieves up to a 94× speedup in inference and reduces memory usage by 97%, while maintaining or even improving model performance on larger-scale datasets, thereby effectively overcoming the scalability limitations inherent in conventional Transformer architectures.

Technology Category

Application Category

📝 Abstract

The long-standing dominance of gradient-boosted decision trees for tabular data has recently been challenged by in-context learning tabular foundation models. In-context learning methods fit and predict in one forward pass without parameter updates by leveraging the training data as context for predicting on query test points. While recent tabular foundation models achieve state-of-the-art performance, their transformer architecture based on the attention mechanism has quadratic complexity regarding dataset size, which in turn increases the overhead on training and inference time, and limits the capacity of the models to handle large-scale datasets. In this work, we propose TACO, an end-to-end tabular compression model that compresses the training dataset in a latent space. We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97\% less memory compared to the state-of-the-art tabular transformer architecture, all while retaining performance without significant degradation. Lastly, our method not only scales better with increased dataset sizes, but it also achieves better performance compared to other baselines.

Problem

Research questions and friction points this paper is trying to address.

tabular foundation models

in-context learning

attention mechanism

quadratic complexity

large-scale datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

tabular foundation models

in-context learning

end-to-end compression