A Decentralized Root Cause Localization Approach for Edge Computing Environments

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address rapid anomaly propagation in microservice-based IoT applications at the edge and the high latency and communication overhead introduced by existing centralized root cause localization (RCL) methods, this paper proposes a decentralized RCL framework. Methodologically, it integrates: (1) a communication- and co-location-aware microservice clustering mechanism; (2) a lightweight cross-cluster peer-to-peer approximate coordination strategy; and (3) a multi-source fusion scoring mechanism tailored to multi-layer heterogeneous anomaly triggers. Leveraging personalized PageRank, the framework performs local graph analysis and collaborative inference directly on edge nodes, enabling fully distributed root cause localization. Evaluations on the MicroCERCL dataset demonstrate that the approach achieves accuracy comparable to or exceeding centralized baselines, while reducing average localization time by up to 34%. This significantly enhances diagnostic efficiency and scalability in resource-constrained edge environments.

Technology Category

Application Category

📝 Abstract
Edge computing environments host increasingly complex microservice-based IoT applications, which are prone to performance anomalies that can propagate across dependent services. Identifying the true source of such anomalies, known as Root Cause Localization (RCL), is essential for timely mitigation. However, existing RCL approaches are designed for cloud environments and rely on centralized analysis, which increases latency and communication overhead when applied at the edge. This paper proposes a decentralized RCL approach that executes localization directly at the edge device level using the Personalized PageRank (PPR) algorithm. The proposed method first groups microservices into communication- and colocation-aware clusters, thereby confining most anomaly propagation within cluster boundaries. Within each cluster, PPR is executed locally to identify the root cause, significantly reducing localization time. For the rare cases where anomalies propagate across clusters, we introduce an inter-cluster peer-to-peer approximation process, enabling lightweight coordination among clusters with minimal communication overhead. To enhance the accuracy of localization in heterogeneous edge environments, we also propose a novel anomaly scoring mechanism tailored to the diverse anomaly triggers that arise across microservice, device, and network layers. Evaluation results on the publicly available edge dataset, MicroCERCL, demonstrate that the proposed decentralized approach achieves comparable or higher localization accuracy than its centralized counterpart while reducing localization time by up to 34%. These findings highlight that decentralized graph-based RCL can provide a practical and efficient solution for anomaly diagnosis in resource-constrained edge environments.
Problem

Research questions and friction points this paper is trying to address.

Localizing performance anomaly root causes in edge computing microservices
Reducing latency and communication overhead of centralized cloud approaches
Handling anomaly propagation across distributed edge service clusters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized root cause localization using Personalized PageRank algorithm
Grouping microservices into communication-aware clusters for anomaly containment
Novel anomaly scoring mechanism for heterogeneous edge environments