Does TabPFN Understand Causal Structures?

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Causal discovery from real-world tabular data remains challenging due to limited labeled causal structures and high computational costs of conventional methods. Method: This work investigates whether frozen representations from the pretrained tabular foundation model TabPFN implicitly encode causal structural information. We propose a lightweight adaptation framework that introduces a learnable decoder and causal tokens to directly reconstruct the adjacency matrix of the causal graph from middle-layer frozen embeddings of TabPFN—without fine-tuning the backbone model. Contribution/Results: Evaluated on synthetic structural causal models generated by Transformers, our approach demonstrates that TabPFN’s middle-layer representations contain strong causal signals. The decoded causal graphs significantly outperform classical constraint-based (e.g., PC) and score-based (e.g., GES) algorithms. To our knowledge, this is the first study to establish a general-purpose tabular foundation model as a knowledge source for causal discovery, pioneering a new paradigm for fine-tuning-free, representation-based causal inference.

Technology Category

Application Category

📝 Abstract

Causal discovery is fundamental for multiple scientific domains, yet extracting causal information from real world data remains a significant challenge. Given the recent success on real data, we investigate whether TabPFN, a transformer-based tabular foundation model pre-trained on synthetic datasets generated from structural causal models, encodes causal information in its internal representations. We develop an adapter framework using a learnable decoder and causal tokens that extract causal signals from TabPFN's frozen embeddings and decode them into adjacency matrices for causal discovery. Our evaluations demonstrate that TabPFN's embeddings contain causal information, outperforming several traditional causal discovery algorithms, with such causal information being concentrated in mid-range layers. These findings establish a new direction for interpretable and adaptable foundation models and demonstrate the potential for leveraging pre-trained tabular models for causal discovery.

Problem

Research questions and friction points this paper is trying to address.

Investigating if TabPFN encodes causal information in embeddings

Developing adapter framework to extract causal signals from embeddings

Evaluating causal discovery performance against traditional algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer model pre-trained on synthetic causal data

Adapter framework extracts causal signals from embeddings

Learnable decoder converts embeddings to adjacency matrices

🔎 Similar Papers

Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries