LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tabular anomaly detection methods rely on strong assumptions and exhibit poor generalizability; meanwhile, direct application of large language models (LLMs) faces challenges in handling heterogeneous tabular data and poses privacy leakage risks. To address these issues, we propose LLM-DAS, a novel framework that redefines the LLM’s role—from “data processor” to “algorithm designer.” Given only high-level logical descriptions of anomaly detection, LLM-DAS autonomously generates data-agnostic, reusable Python programs for synthetic anomaly injection. Crucially, it operates without accessing raw sensitive data, thereby preserving privacy while ensuring logical robustness and cross-dataset transferability. Evaluated on 36 real-world tabular anomaly detection tasks, LLM-DAS consistently enhances the performance of mainstream detectors—demonstrating its effectiveness, strong generalization capability, and practical utility in privacy-sensitive scenarios.

Technology Category

Application Category

📝 Abstract
Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.
Problem

Research questions and friction points this paper is trying to address.

Enhancing anomaly detectors via programmatic synthesis of hard-to-detect anomalies
Addressing inconsistent performance of tabular anomaly detection methods
Overcoming LLM limitations in processing heterogeneous data and privacy risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates anomaly synthesis code
Code creates hard-to-detect adversarial examples
Programmatic approach enhances detector robustness
🔎 Similar Papers
No similar papers found.
Hangting Ye
Hangting Ye
Jilin University
Machine LearningData Mining
J
Jinmeng Li
School of Artificial Intelligence, Jilin University
H
He Zhao
CSIRO’s Data61, Monash University
Mingchen Zhuge
Mingchen Zhuge
KAUST AI
MultimodalLLMAI AgentsCode Generation
D
Dandan Guo
School of Artificial Intelligence, Jilin University
Y
Yi Chang
School of Artificial Intelligence, Jilin University
Hongyuan Zha
Hongyuan Zha
The Chinese University of Hong Kong, Shenzhen
machine learning