Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the latency issue in fault prediction for distributed systems, this paper proposes a novel GRU-Attention-FFN cascaded model—first integrating self-attention into the GRU architecture to dynamically emphasize critical pre-fault time intervals. This design significantly enhances early detection of subtle anomalies under long-range temporal dependencies. The model synergistically combines gated recurrent units, learnable temporal attention, and a feed-forward network, optimized via binary classification loss with rigorous convergence analysis. Evaluated on a large-scale real-world cloud system dataset, the method achieves 94.2% accuracy and an AUC of 0.963, outperforming established baselines—including LSTM, TCN, and Prophet—by 12.6 percentage points in F1-score. Results demonstrate its effectiveness and state-of-the-art performance in low-latency fault预警.

Technology Category

Application Category

📝 Abstract
This paper addresses the challenges of fault prediction and delayed response in distributed systems by proposing an intelligent prediction method based on temporal feature learning. The method takes multi-dimensional performance metric sequences as input. We use a Gated Recurrent Unit (GRU) to model the evolution of system states over time. An attention mechanism is then applied to enhance key temporal segments, improving the model's ability to identify potential faults. On this basis, a feedforward neural network is designed to perform the final classification, enabling early warning of system failures. To validate the effectiveness of the proposed approach, comparative experiments and ablation analyses were conducted using data from a large-scale real-world cloud system. The experimental results show that the model outperforms various mainstream time-series models in terms of Accuracy, F1-Score, and AUC. This demonstrates strong prediction capability and stability. Furthermore, the loss function curve confirms the convergence and reliability of the training process. It indicates that the proposed method effectively learns system behavior patterns and achieves efficient fault detection.
Problem

Research questions and friction points this paper is trying to address.

Proactive fault prediction in distributed systems using deep learning
Modeling system state evolution with GRU and attention mechanisms
Improving fault detection accuracy and early warning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GRU for temporal state evolution modeling
Applies attention mechanism to highlight key segments
Combines feedforward network for final fault classification
🔎 Similar Papers
No similar papers found.
Y
Yang Wang
University of Michigan Ann Arbor, USA
Wenxuan Zhu
Wenxuan Zhu
MS/PhD KAUST
X
Xuehui Quan
University of Washington Seattle, USA
H
Heyi Wang
Illinois Institute of Technology Chicago, USA
C
Chang Liu
Washington University in St. Louis St. Louis, USA
Qiyuan Wu
Qiyuan Wu
University of California, San Diego
Machine LearningBiologyArtificial Intelligence