AnoMod: A Dataset for Anomaly Detection and Root Cause Analysis in Microservice Systems

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of high-quality, publicly available multimodal datasets in microservice systems, which has hindered research on anomaly detection and root cause analysis. To bridge this gap, the authors introduce AnoMod, a novel dataset constructed by injecting four types of anomalies—spanning performance, service, database, and code layers—into two open-source microservice benchmarks, SocialNetwork and TrainTicket. Concurrently, five modalities of system telemetry are collected: logs, metrics, distributed traces, API responses, and code coverage, enabling an end-to-end fault simulation environment. AnoMod supports the evaluation of cross-modal anomaly detection approaches and facilitates fine-grained root cause localization at both service and code levels, thereby advancing end-to-end fault diagnosis in microservice architectures.

Technology Category

Application Category

📝 Abstract
Microservice systems (MSS) have become a predominant architectural style for cloud services. Yet the community still lacks high-quality, publicly available datasets for anomaly detection (AD) and root cause analysis (RCA) in MSS. Most benchmarks emphasize performance-related faults and provide only one or two monitoring modalities, limiting research on broader failure modes and cross-modal methods. To address these gaps, we introduce a new multimodal anomaly dataset built on two open-source microservice systems: SocialNetwork and TrainTicket. We design and inject four categories of anomalies (Ano): performance-level, service-level, database-level, and code-level, to emulate realistic anomaly modes. For each scenario, we collect five modalities (Mod): logs, metrics, distributed traces, API responses, and code coverage reports, offering a richer, end-to-end view of system state and inter-service interactions. We name our dataset, reflecting its unique properties, as AnoMod. This dataset enables (1) evaluation of cross-modal anomaly detection and fusion/ablation strategies, and (2) fine-grained RCA studies across service and code regions, supporting end-to-end troubleshooting pipelines that jointly consider detection and localization.
Problem

Research questions and friction points this paper is trying to address.

anomaly detection
root cause analysis
microservice systems
multimodal dataset
failure modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal dataset
anomaly detection
root cause analysis
microservice systems
cross-modal fusion
K
Ke Ping
Department of Computer Science, University of Helsinki, Helsinki, Finland
H
Hamza Bin Mazhar
Department of Computer Science, University of Helsinki, Helsinki, Finland
Yuqing Wang
Yuqing Wang
Postdoc researcher, University of Helsinki
AIOpssoftware testingsoftware process improvementnatural language processing
Ying Song
Ying Song
University of Minnesota - Twin Cities
Geographic Information ScienceTime GeographySpatial-Temporal Analysis and ModelingTransportation Geographytion
M
Mika V. Mäntylä
Department of Computer Science, University of Helsinki, Helsinki, Finland