Causal Pre-training Under the Fairness Lens: An Empirical Study of TabPFN

📅 2026-01-25

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study investigates the impact of causal pretraining on the fairness of Tabular Foundation Models (TabPFNs), particularly under distributional shifts such as missing-not-at-random (MNAR) conditions. Leveraging structural causal models, the authors generate large-scale synthetic data to causally pretrain TabPFN and systematically evaluate its performance in terms of predictive accuracy, robustness, and fairness. The work provides the first systematic empirical evidence that, while causal pretraining substantially enhances model accuracy and robustness against spurious correlations, it yields only limited improvements in algorithmic fairness and exhibits inconsistent fairness outcomes across different distribution shifts. These findings offer a crucial empirical foundation for understanding the nuanced relationship between causal representation learning and fairness in tabular machine learning models.

Technology Category

Application Category

📝 Abstract

Foundation models for tabular data, such as the Tabular Prior-data Fitted Network (TabPFN), are pre-trained on a massive number of synthetic datasets generated by structural causal models (SCM). They leverage in-context learning to offer high predictive accuracy in real-world tasks. However, the fairness properties of these foundational models, which incorporate ideas from causal reasoning during pre-training, remain underexplored. In this work, we conduct a comprehensive empirical evaluation of TabPFN and its fine-tuned variants, assessing predictive performance, fairness, and robustness across varying dataset sizes and distributional shifts. Our results reveal that while TabPFN achieves stronger predictive accuracy compared to baselines and exhibits robustness to spurious correlations, improvements in fairness are moderate and inconsistent, particularly under missing-not-at-random (MNAR) covariate shifts. These findings suggest that the causal pre-training in TabPFN is helpful but insufficient for algorithmic fairness, highlighting implications for deploying TabPFN (and similar) models in practice and the need for further fairness interventions.

Problem

Research questions and friction points this paper is trying to address.

causal pre-training

algorithmic fairness

tabular foundation models

distributional shift

missing-not-at-random

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal pre-training

algorithmic fairness

TabPFN