Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge of unstable fine-tuning in tabular foundation models under data-scarce conditions, where excessively small validation sets often render early stopping ineffective and hinder reliable estimation of generalization performance. To mitigate this, the study introduces structural causal models (SCMs) into tabular data augmentation for the first time, generating synthetic samples that preserve the underlying feature dependencies by modeling the causal structure of the target data. The proposed CausalMixFT algorithm substantially improves the correlation between validation and test performance. Evaluated across 33 classification datasets, it increases the median normalized ROC-AUC from 0.10 to 0.12 and reduces the validation–test performance gap from 0.67 to 0.30, outperforming existing methods such as CTGAN and TabEBM.

Technology Category

Application Category

📝 Abstract

Fine-tuning tabular foundation models (TFMs) under data scarcity is challenging, as early stopping on even scarcer validation data often fails to capture true generalization performance. We propose CausalMixFT, a method that enhances fine-tuning robustness and downstream performance by generating structurally consistent synthetic samples using Structural Causal Models (SCMs) fitted on the target dataset. This approach augments limited real data with causally informed synthetic examples, preserving feature dependencies while expanding training diversity. Evaluated across 33 classification datasets from TabArena and over 2300 fine-tuning runs, our CausalMixFT method consistently improves median normalized ROC-AUC from 0.10 (standard fine-tuning) to 0.12, outperforming purely statistical generators such as CTGAN (-0.01), TabEBM (-0.04), and TableAugment (-0.09). Moreover, it narrows the median validation-test performance correlation gap from 0.67 to 0.30, enabling more reliable validation-based early stopping, a key step toward improving fine-tuning stability under data scarcity. These results demonstrate that incorporating causal structure into data augmentation provides an effective and principled route to fine-tuning tabular foundation models in low-data regimes.

Problem

Research questions and friction points this paper is trying to address.

data scarcity

fine-tuning

tabular foundation models

validation-test gap

generalization performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Data Augmentation

Structural Causal Models

Tabular Foundation Models