🤖 AI Summary
Existing tabular anomaly detection methods rely on strong assumptions and exhibit poor generalizability; meanwhile, direct application of large language models (LLMs) faces challenges in handling heterogeneous tabular data and poses privacy leakage risks. To address these issues, we propose LLM-DAS, a novel framework that redefines the LLM’s role—from “data processor” to “algorithm designer.” Given only high-level logical descriptions of anomaly detection, LLM-DAS autonomously generates data-agnostic, reusable Python programs for synthetic anomaly injection. Crucially, it operates without accessing raw sensitive data, thereby preserving privacy while ensuring logical robustness and cross-dataset transferability. Evaluated on 36 real-world tabular anomaly detection tasks, LLM-DAS consistently enhances the performance of mainstream detectors—demonstrating its effectiveness, strong generalization capability, and practical utility in privacy-sensitive scenarios.
📝 Abstract
Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.