Diffusion-Scheduled Denoising Autoencoders for Anomaly Detection in Tabular Data

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of complex feature interactions and scarce anomalous samples in tabular anomaly detection, this paper proposes DiffCL—a denoising autoencoder framework that integrates diffusion-model noise scheduling with contrastive learning. DiffCL embeds temporally controlled diffusion-style noise injection into an encoder-decoder architecture and employs contrastive learning to enhance the representational separability between normal and anomalous samples, thereby improving discriminative capability under both semi-supervised and unsupervised settings. Extensive experiments across 57 benchmark datasets demonstrate that DiffCL achieves substantial gains: +9 percentage points (65% relative improvement) in PR-AUC and +6 percentage points (16% relative improvement) in ROC-AUC under semi-supervised evaluation, significantly outperforming state-of-the-art autoencoder- and diffusion-based baselines. These results validate DiffCL’s strong robustness and generalization across diverse anomaly distributions.

Technology Category

Application Category

📝 Abstract
Anomaly detection in tabular data remains challenging due to complex feature interactions and the scarcity of anomalous examples. Denoising autoencoders rely on fixed-magnitude noise, limiting adaptability to diverse data distributions. Diffusion models introduce scheduled noise and iterative denoising, but lack explicit reconstruction mappings. We propose the Diffusion-Scheduled Denoising Autoencoder (DDAE), a framework that integrates diffusion-based noise scheduling and contrastive learning into the encoding process to improve anomaly detection. We evaluated DDAE on 57 datasets from ADBench. Our method outperforms in semi-supervised settings and achieves competitive results in unsupervised settings, improving PR-AUC by up to 65% (9%) and ROC-AUC by 16% (6%) over state-of-the-art autoencoder (diffusion) model baselines. We observed that higher noise levels benefit unsupervised training, while lower noise with linear scheduling is optimal in semi-supervised settings. These findings underscore the importance of principled noise strategies in tabular anomaly detection.
Problem

Research questions and friction points this paper is trying to address.

Challenges in detecting anomalies in tabular data due to complex feature interactions
Limitations of fixed-magnitude noise in denoising autoencoders for diverse data
Lack of explicit reconstruction mappings in diffusion models for anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates diffusion-based noise scheduling
Uses contrastive learning in encoding
Optimizes noise levels for training
🔎 Similar Papers
No similar papers found.