🤖 AI Summary
To address the insufficient fusion of multimodal monitoring data (logs, metrics, traces) and poor task adaptability in microservice systems—leading to inaccurate root cause localization and fault type identification—this paper proposes a task-oriented multimodal fault diagnosis framework. Our method introduces: (1) a task-oriented learning mechanism that dynamically weights modality-specific representations according to diagnostic subtasks (e.g., instance localization vs. fault type classification), thereby enhancing modality advantages on demand; and (2) cross-modal contrastive learning coupled with graph-level data augmentation, which mitigates label scarcity by randomly masking normal instances. Evaluated on two real-world datasets, our framework achieves a 55.94% improvement in Hit Rate@1 and a >4.08% gain in F1-score over state-of-the-art approaches, demonstrating superior diagnostic accuracy and robustness.
📝 Abstract
Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information. Several failure diagnosis methods have been recently proposed to integrate multimodal data based on deep learning. These methods, however, tend to combine modalities indiscriminately and treat them equally in failure diagnosis, ignoring the relationship between specific modalities and different diagnostic tasks. This oversight hinders the effective utilization of the unique advantages offered by each modality. To address the limitation, we propose extit{TVDiag}, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types (e.g., Net-packets Corruption) in microservice-based systems. extit{TVDiag} employs task-oriented learning to enhance the potential advantages of each modality and establishes cross-modal associations based on contrastive learning to extract view-invariant failure information. Furthermore, we develop a graph-level data augmentation strategy that randomly inactivates the observability of some normal microservice instances during training to mitigate the shortage of training data. Experimental results show that extit{TVDiag} outperforms state-of-the-art methods in multimodal failure diagnosis, achieving at least a 55.94% higher $HR@1$ accuracy and over a 4.08% increase in F1-score across two datasets.