🤖 AI Summary
This study systematically investigates the effectiveness and applicability of fine-tuning in Tabular Foundation Models (TFMs), aiming to answer when and how fine-tuning can improve performance, calibration, and fairness. Through comprehensive evaluation across benchmarks including TALENT, OpenML-CC18, and TabZilla, the work compares zero-shot inference, meta-learning, full supervised fine-tuning (SFT), and parameter-efficient fine-tuning (PEFT). The findings reveal that zero-shot TFMs already exhibit strong performance, while SFT often degrades accuracy or calibration quality. Meta-learning and PEFT yield only marginal gains under specific data conditions. This work is the first to demonstrate that fine-tuning is not universally beneficial and provides practical, data-characteristic-driven guidelines for its application, clearly delineating the boundaries of its advantages.
📝 Abstract
Tabular Foundation Models (TFMs) have recently shown strong in-context learning capabilities on structured data, achieving zero-shot performance comparable to traditional machine learning methods. We find that zero-shot TFMs already achieve strong performance, while the benefits of fine-tuning are highly model and data-dependent. Meta-learning and PEFT provide moderate gains under specific conditions, whereas full supervised fine-tuning (SFT) often reduces accuracy or calibration quality. This work presents the first comprehensive study of fine-tuning in TFMs across benchmarks including TALENT, OpenML-CC18, and TabZilla. We compare Zero-Shot, Meta-Learning, Supervised (SFT), and parameter-efficient (PEFT) approaches, analyzing how dataset factors such as imbalance, size, and dimensionality affect outcomes. Our findings cover performance, calibration, and fairness, offering practical guidelines on when fine-tuning is most beneficial and its limitations.