Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address anomaly detection in high-dimensional, label-scarce multivariate time series, this paper proposes an end-to-end reinforcement learning framework integrating a variational autoencoder (VAE), an LSTM-enhanced deep Q-network (DQN), and uncertainty-driven active learning. The method introduces a novel dynamic reward scaling mechanism that adaptively balances reconstruction error and classification confidence signals; employs the VAE to learn robust latent representations; leverages the LSTM-DQN to model sequential sampling decisions; and utilizes active learning to substantially reduce annotation costs. Evaluated on the SMD and WADI industrial datasets, the framework achieves significant improvements in F1-score and area under the precision-recall curve (AU-PR) over state-of-the-art methods. Results demonstrate its effectiveness in noise suppression, temporal dependency modeling, and few-shot generalization, confirming strong practicality and scalability.

Technology Category

Application Category

📝 Abstract

Detecting anomalies in multivariate time series is essential for monitoring complex industrial systems, where high dimensionality, limited labeled data, and subtle dependencies between sensors cause significant challenges. This paper presents a deep reinforcement learning framework that combines a Variational Autoencoder (VAE), an LSTM-based Deep Q-Network (DQN), dynamic reward shaping, and an active learning module to address these issues in a unified learning framework. The main contribution is the implementation of Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection (DRSMT), which demonstrates how each component enhances the detection process. The VAE captures compact latent representations and reduces noise. The DQN enables adaptive, sequential anomaly classification, and the dynamic reward shaping balances exploration and exploitation during training by adjusting the importance of reconstruction and classification signals. In addition, active learning identifies the most uncertain samples for labeling, reducing the need for extensive manual supervision. Experiments on two multivariate benchmarks, namely Server Machine Dataset (SMD) and Water Distribution Testbed (WADI), show that the proposed method outperforms existing baselines in F1-score and AU-PR. These results highlight the effectiveness of combining generative modeling, reinforcement learning, and selective supervision for accurate and scalable anomaly detection in real-world multivariate systems.

Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies in multivariate time series with limited labeled data

Addressing high dimensionality and subtle sensor dependencies in industrial systems

Balancing exploration and exploitation through dynamic reward scaling

Innovation

Methods, ideas, or system contributions that make the work stand out.

VAE captures compact latent representations and reduces noise

DQN enables adaptive sequential anomaly classification

Dynamic reward shaping balances exploration and exploitation

🔎 Similar Papers

TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data