Calibrating Tabular Anomaly Detection via Optimal Transport

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the limited generalization of existing tabular anomaly detection (TAD) methods due to data heterogeneity. We propose CTAD, a model-agnostic post-hoc calibration framework that, for the first time, introduces optimal transport into TAD. CTAD calibrates anomaly scores at the sample level by quantifying the perturbation required to align a test sample’s compatibility with both the empirical distribution of normal data and a K-means structural distribution. Theoretical analysis shows that anomalous samples induce larger perturbations. Notably, CTAD is parameter-free and compatible with diverse detectors. Evaluated across 34 heterogeneous datasets, it consistently enhances the performance of seven representative TAD methods—spanning density estimation, classification, reconstruction, and isolation paradigms—including state-of-the-art deep models, while demonstrating robustness to hyperparameter settings.

Technology Category

Application Category

📝 Abstract

Tabular anomaly detection (TAD) remains challenging due to the heterogeneity of tabular data: features lack natural relationships, vary widely in distribution and scale, and exhibit diverse types. Consequently, each TAD method makes implicit assumptions about anomaly patterns that work well on some datasets but fail on others, and no method consistently outperforms across diverse scenarios. We present CTAD (Calibrating Tabular Anomaly Detection), a model-agnostic post-processing framework that enhances any existing TAD detector through sample-specific calibration. Our approach characterizes normal data via two complementary distributions, i.e., an empirical distribution from random sampling and a structural distribution from K-means centroids, and measures how adding a test sample disrupts their compatibility using Optimal Transport (OT) distance. Normal samples maintain low disruption while anomalies cause high disruption, providing a calibration signal to amplify detection. We prove that OT distance has a lower bound proportional to the test sample's distance from centroids, and establish that anomalies systematically receive higher calibration scores than normals in expectation, explaining why the method generalizes across datasets. Extensive experiments on 34 diverse tabular datasets with 7 representative detectors spanning all major TAD categories (density estimation, classification, reconstruction, and isolation-based methods) demonstrate that CTAD consistently improves performance with statistical significance. Remarkably, CTAD enhances even state-of-the-art deep learning methods and shows robust performance across diverse hyperparameter settings, requiring no additional tuning for practical deployment.

Problem

Research questions and friction points this paper is trying to address.

Tabular anomaly detection

Data heterogeneity

Anomaly pattern assumptions

Model generalization

Optimal Transport

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Transport

Tabular Anomaly Detection

Model-agnostic Calibration