🤖 AI Summary
To address the limited anomaly detection performance in non-stationary videos—such as aerial remote sensing sequences—this paper explicitly models temporal non-stationarity for the first time. We propose the Temporal Recursive Differencing Network (TRDN), which embeds differential preprocessing into a deep predictive backbone and jointly employs autoregressive moving-average (ARMA) estimation for dynamic statistical modeling. TRDN further integrates optical flow–based feature extraction with a prediction-error-driven anomaly scoring mechanism. Evaluated on three aerial video datasets and two standard video anomaly detection (VAD) benchmarks, our method achieves state-of-the-art (SOTA) performance, surpassing existing approaches in both Equal Error Rate (EER) and Area Under the Curve (AUC). It significantly enhances robustness to time-varying feature distributions and improves fine-grained anomaly localization accuracy.
📝 Abstract
Most videos, including those captured through aerial remote sensing, are usually nonstationary in nature having time-varying feature statistics. Although sophisticated reconstruction and prediction models exist for video anomaly detection (VAD), effective handling of nonstationarity has seldom been considered explicitly. In this letter, we propose to perform prediction using a time-recursive differencing network followed by autoregressive moving average estimation for VAD. The differencing network is employed to effectively handle nonstationarity in video data during the anomaly detection. Focusing on the prediction process, the effectiveness of the proposed approach is demonstrated considering a simple optical flow-based video feature, and by generating qualitative and quantitative results on three aerial video data sets and two standard anomaly detection video data sets. Equal error rate (EER), area under curve (AUC), and ROC curve-based comparison with several existing methods including the state-of-the-art reveal the superiority of the proposed approach.