🤖 AI Summary
Outlier detection (OD) faces significant practical limitations due to the absence of labeled supervision, leading to challenges in model selection and hyperparameter tuning. To address this, we propose FoMo-0D—the first zero-shot foundational model for tabular OD—capable of directly outputting anomaly scores without fine-tuning, ground-truth labels, or manual model selection. Its core contributions are: (1) establishing the first zero-shot OD paradigm; (2) leveraging large-scale synthetic data for pretraining to encode unsupervised statistical priors; and (3) employing a lightweight architecture for efficient inference. Evaluated on 57 real-world tabular datasets, FoMo-0D consistently outperforms 26 state-of-the-art baselines, achieving an average per-sample inference latency of just 7.7 ms—over 7× faster than comparable methods—while enabling cross-dataset plug-and-play deployment.
📝 Abstract
Outlier detection (OD) has a vast literature as it finds numerous real-world applications. Being an inherently unsupervised task, model selection is a key bottleneck for OD without label supervision. Despite many OD techniques are available to choose from, algorithm and hyperparameter selection remain challenging for OD, limiting its effective use in practice. In this paper, we present FoMo-0D, a pre-trained Foundation Model for zero/0-shot OD on tabular data, which bypasses the hurdle of model selection. To overcome the difficulty of labeled data collection, FoMo-0D is trained on synthetic data and can directly predict the (outlier/inlier) label of test samples without parameter fine-tuning -- making the need obsolete for choosing an algorithm/architecture and tuning its associated hyperparameters when given a new OD dataset. Extensive experiments on 57 real-world datasets against 26 baselines show that FoMo-0D significantly outperforms the vast majority of the baselines and is statistically no different from the 2nd best method, with an average inference time of 7.7 ms per sample, offering at least 7x speed-up compared to previous methods. To facilitate future research, our implementations and checkpoints are openly available at https://anonymous.4open.science/r/PFN40D.