🤖 AI Summary
Existing log-based anomaly detection (LAD) methods face two key limitations in cloud system attack detection: (1) training datasets exhibit narrow behavioral coverage and lack a holistic system-level view; and (2) they ignore natural distribution shifts inherent to cloud-native environments—such as application updates, version upgrades, and infrastructure reconfigurations—leading to poor robustness. To address this, we propose CAShift, the first *normality-shift-aware* benchmark for cloud log attack detection. CAShift systematically models three realistic shift types across 20 diverse cloud-component attack scenarios. Leveraging log parsing, a shift-robust evaluation framework, continual learning adaptation, and multi-dimensional distribution modeling, it enables the first quantitative assessment of LAD methods under distribution shifts. Experiments show that state-of-the-art LAD methods suffer up to a 34% drop in F1-score under shifts; integrating continual learning recovers up to 27% performance gain, empirically validating the efficacy of shift-adaptation mechanisms.
📝 Abstract
With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Besides, another critical issue to consider is normality shift, which implies the test distribution could differ from the training distribution and highly affects the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other important and cloud-specific shift types are ignored, e.g., the distribution shift introduced by different deployed cloud architectures. Therefore, creating a new dataset that covers diverse behaviors of cloud systems and normality shift types is necessary. To fill the gap in evaluating LAD under real-world conditions, we present CAShift, the first normality shift-aware dataset for cloud systems. CAShift captures three shift types, including application, version, and cloud architecture shifts, and includes 20 diverse attack scenarios across various cloud components. Using CAShift, we conduct an empirical study showing that (1) all LAD methods are significantly affected by normality shifts, with performance drops of up to 34%, and (2) continuous learning techniques can improve F1-scores by up to 27%, depending on data usage and algorithm choice. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.