Multiple-Input Variational Auto-Encoder for Anomaly Detection in Heterogeneous Data

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low detection accuracy and poor robustness in anomaly detection under non-i.i.d. heterogeneous data, this paper proposes Multi-Input Variational Autoencoders (MIVAE) and Multi-Input Autoencoders for Anomaly Detection (MIAEAD). The method introduces three key innovations: (1) a novel parallel sub-encoder architecture that independently models distinct feature subsets; (2) a theoretical proof demonstrating superior anomaly discriminability compared to standard VAEs; and (3) the first heterogeneity-adaptive assessment mechanism based on the coefficient of variation (CV). By jointly leveraging latent-space regularization and reconstruction-error-driven anomaly scoring—enhanced with an AUC-maximization strategy—the approach achieves an average 6% AUC improvement over state-of-the-art unsupervised methods across eight real-world datasets, with particularly notable gains on low-heterogeneity subsets.

Technology Category

Application Category

📝 Abstract
Anomaly detection (AD) plays a pivotal role in AI applications, e.g., in classification, and intrusion/threat detection in cybersecurity. However, most existing methods face challenges of heterogeneity amongst feature subsets posed by non-independent and identically distributed (non-IID) data. We propose a novel neural network model called Multiple-Input Auto-Encoder for AD (MIAEAD) to address this. MIAEAD assigns an anomaly score to each feature subset of a data sample to indicate its likelihood of being an anomaly. This is done by using the reconstruction error of its sub-encoder as the anomaly score. All sub-encoders are then simultaneously trained using unsupervised learning to determine the anomaly scores of feature subsets. The final AUC of MIAEAD is calculated for each sub-dataset, and the maximum AUC obtained among the sub-datasets is selected. To leverage the modelling of the distribution of normal data to identify anomalies of the generative models, we develop a novel neural network architecture/model called Multiple-Input Variational Auto-Encoder (MIVAE). MIVAE can process feature subsets through its sub-encoders before learning distribution of normal data in the latent space. This allows MIVAE to identify anomalies that deviate from the learned distribution. We theoretically prove that the difference in the average anomaly score between normal samples and anomalies obtained by the proposed MIVAE is greater than that of the Variational Auto-Encoder (VAEAD), resulting in a higher AUC for MIVAE. Extensive experiments on eight real-world anomaly datasets demonstrate the superior performance of MIAEAD and MIVAE over conventional methods and the state-of-the-art unsupervised models, by up to 6% in terms of AUC score. Alternatively, MIAEAD and MIVAE have a high AUC when applied to feature subsets with low heterogeneity based on the coefficient of variation (CV) score.
Problem

Research questions and friction points this paper is trying to address.

Anomaly Detection
Complex Data
Efficiency Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

MIAEAD
MIVAE
Anomaly Detection
🔎 Similar Papers
No similar papers found.
P
Phai Vu Dinh
School of Electrical and Data Engineering, the University of Technology Sydney, Sydney, NSW 2007, Australia
Diep N. Nguyen
Diep N. Nguyen
University of Technology Sydney
Mobile ComputingCommunications and NetworkingWireless and Cyber Security5G/6GApplied AI
D
D. Hoang
School of Electrical and Data Engineering, the University of Technology Sydney, Sydney, NSW 2007, Australia
Q
Quang Uy Nguyen
Computer Science Department, Institute of Information and Communication Technology, Le Quy Don Technical University, Hanoi, Vietnam
E
E. Dutkiewicz
School of Electrical and Data Engineering, the University of Technology Sydney, Sydney, NSW 2007, Australia