🤖 AI Summary
This work addresses the challenge of efficiently and accurately estimating feature importance—such as Shapley values—when access to the target model is unavailable or computational resources are limited. The authors propose ExplainerPFN, a zero-shot explainer based on TabPFN, which is pretrained on synthetic data generated from structural causal models. Notably, ExplainerPFN directly predicts Shapley values without requiring access to the target model, its gradients, or any real-world examples. This approach represents the first fully model-free, reference-free method for zero-shot Shapley value estimation and extends naturally to few-shot settings with only a handful of observations. Experiments demonstrate that ExplainerPFN matches or exceeds the performance of surrogate explainers that rely on 2–10 SHAP-computed examples, offering both high efficiency and practical utility. The complete training pipeline and data generator are openly released.
📝 Abstract
Computing the importance of features in supervised classification tasks is critical for model interpretability. Shapley values are a widely used approach for explaining model predictions, but require direct access to the underlying model, an assumption frequently violated in real-world deployments. Further, even when model access is possible, their exact computation may be prohibitively expensive. We investigate whether meaningful Shapley value estimations can be obtained in a zero-shot setting, using only the input data distribution and no evaluations of the target model. To this end, we introduce ExplainerPFN, a tabular foundation model built on TabPFN that is pretrained on synthetic datasets generated from random structural causal models and supervised using exact or near-exact Shapley values. Once trained, ExplainerPFN predicts feature attributions for unseen tabular datasets without model access, gradients, or example explanations. Our contributions are fourfold: (1) we show that few-shot learning-based explanations can achieve high fidelity to SHAP values with as few as two reference observations; (2) we propose ExplainerPFN, the first zero-shot method for estimating Shapley values without access to the underlying model or reference explanations; (3) we provide an open-source implementation of ExplainerPFN, including the full training pipeline and synthetic data generator; and (4) through extensive experiments on real and synthetic datasets, we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2-10 SHAP examples.