ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tabular anomaly detection (AD) benchmarks lack semantic context—such as feature descriptions and domain knowledge—limiting model performance and interpretability. To address this, we introduce TabAD-Bench, the first AD benchmark explicitly supporting semantic context: it systematically integrates 20 annotated tabular datasets with structured textual metadata, including feature definitions, business rules, and domain constraints. We further propose TabAD-LLM, a zero-shot large language model (LLM) framework that performs context-aware anomaly detection without fine-tuning. TabAD-LLM jointly encodes tabular structure and semantic metadata to enable domain-informed reasoning. Extensive experiments demonstrate that incorporating semantic context significantly improves detection accuracy and explanation quality. This work establishes a standardized, semantics-enriched benchmark for tabular AD, provides a strong zero-shot baseline, and introduces a novel paradigm for context-driven anomaly detection.

Technology Category

Application Category

📝 Abstract
In tabular anomaly detection (AD), textual semantics often carry critical signals, as the definition of an anomaly is closely tied to domain-specific context. However, existing benchmarks provide only raw data points without semantic context, overlooking rich textual metadata such as feature descriptions and domain knowledge that experts rely on in practice. This limitation restricts research flexibility and prevents models from fully leveraging domain knowledge for detection. ReTabAD addresses this gap by restoring textual semantics to enable context-aware tabular AD research. We provide (1) 20 carefully curated tabular datasets enriched with structured textual metadata, together with implementations of state-of-the-art AD algorithms including classical, deep learning, and LLM-based approaches, and (2) a zero-shot LLM framework that leverages semantic context without task-specific training, establishing a strong baseline for future research. Furthermore, this work provides insights into the role and utility of textual metadata in AD through experiments and analysis. Results show that semantic context improves detection performance and enhances interpretability by supporting domain-aware reasoning. These findings establish ReTabAD as a benchmark for systematic exploration of context-aware AD.
Problem

Research questions and friction points this paper is trying to address.

Restoring textual semantics in tabular anomaly detection
Addressing lack of semantic context in existing AD benchmarks
Enabling domain-aware reasoning through enriched metadata
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enriched datasets with structured textual metadata
Zero-shot LLM framework leveraging semantic context
Implementing classical, deep learning, and LLM algorithms
🔎 Similar Papers
No similar papers found.
S
Sanghyu Yoon
LG AI Research, Seoul, South Korea
D
Dongmin Kim
LG AI Research, Seoul, South Korea
S
Suhee Yoon
LG AI Research, Seoul, South Korea
Y
Ye Seul Sim
LG AI Research, Seoul, South Korea
Seungdong Yoa
Seungdong Yoa
Korea University
Machine learningComputer visionDeep learning
H
Hye-Seung Cho
LG AI Research, Seoul, South Korea
Soonyoung Lee
Soonyoung Lee
LG AI Research
Computer VisionMachine Learning
Hankook Lee
Hankook Lee
Assistant Professor, Department of Computer Science and Engineering at Sungkyunkwan University
machine learningdeep learning
Woohyung Lim
Woohyung Lim
LG AI Research
Deep LearningRepresentation LearningAnomaly DetectionTime-series Forecasting