MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing tabular anomaly detection benchmarks, which suffer from small scale and insufficient diversity, thereby hindering reliable evaluation. To this end, we introduce MacrOData, a large-scale benchmark comprising 2,446 real and synthetic datasets, organized into three subsets: OddBench, OvrBench, and SynBench. MacrOData provides standardized train/test splits, public/private partitions, and rich semantic metadata. It is the first benchmark to achieve thousand-scale dataset coverage with diverse anomaly types, incorporating semantic annotations, a private test set, and an online leaderboard hosted on Hugging Face. Through systematic data collection and curation—integrating real semantic anomalies, statistical outliers, and synthetic generation strategies—we comprehensively evaluate classical, deep learning, and foundation model approaches. This significantly enhances evaluation diversity, statistical power, and reproducibility. The dataset, experimental results, and practical guidelines are publicly released.

Technology Category

Application Category

📝 Abstract
Quality benchmarks are essential for fairly and accurately tracking scientific progress and enabling practitioners to make informed methodological choices. Outlier detection (OD) on tabular data underpins numerous real-world applications, yet existing OD benchmarks remain limited. The prominent OD benchmark AdBench is the de facto standard in the literature, yet comprises only 57 datasets. In addition to other shortcomings discussed in this work, its small scale severely restricts diversity and statistical power. We introduce MacrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvrBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes. Owing to its scale and diversity, MacrOData enables comprehensive and statistically robust evaluation of tabular OD methods. Our benchmarks further satisfy several key desiderata: We provide standardized train/test splits for all datasets, public/private benchmark partitions with held-out test labels for the latter reserved toward an online leaderboard, and annotate our datasets with semantic metadata. We conduct extensive experiments across all benchmarks, evaluating a broad range of OD methods comprising classical, deep, and foundation models, over diverse hyperparameter configurations. We report detailed empirical findings, practical guidelines, as well as individual performances as references for future research. All benchmarks containing 2,446 datasets combined are open-sourced, along with a publicly accessible leaderboard hosted at https://huggingface.co/MacrOData-CMU.
Problem

Research questions and friction points this paper is trying to address.

outlier detection
tabular data
benchmark
dataset diversity
statistical power
Innovation

Methods, ideas, or system contributions that make the work stand out.

tabular outlier detection
large-scale benchmark
dataset diversity
standardized evaluation
semantic metadata
🔎 Similar Papers
No similar papers found.