Modeling Anomaly Detection in Cloud Services: Analysis of the Properties that Impact Latency and Resource Consumption

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the latency–resource trade-off in performance anomaly detection for cloud services. We propose a modeling and quantitative analysis framework based on Stochastic Reward Nets (SRNs) to formally characterize how detection accuracy, recall, and inspection frequency jointly influence service latency and computational overhead. Our analysis reveals a dynamic trade-off: accuracy dominates the performance–cost balance under high-frequency inspection, whereas recall becomes the critical bottleneck under low-frequency inspection. Experimental evaluation demonstrates that the model accurately quantifies the aggregate cost of diverse detection strategies and provides a theoretical foundation for adaptive parameter tuning. The primary contribution is the first systematic formal modeling and empirical validation of the nonlinear trade-offs among anomaly detection parameters in cloud environments—enabling cost-effective, intelligent monitoring strategy design.

Technology Category

Application Category

📝 Abstract
Detecting and resolving performance anomalies in Cloud services is crucial for maintaining desired performance objectives. Scaling actions triggered by an anomaly detector help achieve target latency at the cost of extra resource consumption. However, performance anomaly detectors make mistakes. This paper studies which characteristics of performance anomaly detection are important to optimize the trade-off between performance and cost. Using Stochastic Reward Nets, we model a Cloud service monitored by a performance anomaly detector. Using our model, we study the impact of detector characteristics, namely precision, recall and inspection frequency, on the average latency and resource consumption of the monitored service. Our results show that achieving a high precision and a high recall is not always necessary. If detection can be run frequently, a high precision is enough to obtain a good performance-to-cost trade-off, but if the detector is run infrequently, recall becomes the most important.
Problem

Research questions and friction points this paper is trying to address.

Optimizing trade-off between cloud performance and cost
Analyzing impact of detector precision and recall
Determining optimal anomaly detection frequency for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Stochastic Reward Nets for cloud modeling
Analyzes precision recall frequency impact on performance
Shows high precision sufficient with frequent detection
🔎 Similar Papers
No similar papers found.