Time-Series Learning for Proactive Fault Prediction in Distributed Systems with Deep Neural Structures

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the latency issue in fault prediction for distributed systems, this paper proposes a novel GRU-Attention-FFN cascaded model—first integrating self-attention into the GRU architecture to dynamically emphasize critical pre-fault time intervals. This design significantly enhances early detection of subtle anomalies under long-range temporal dependencies. The model synergistically combines gated recurrent units, learnable temporal attention, and a feed-forward network, optimized via binary classification loss with rigorous convergence analysis. Evaluated on a large-scale real-world cloud system dataset, the method achieves 94.2% accuracy and an AUC of 0.963, outperforming established baselines—including LSTM, TCN, and Prophet—by 12.6 percentage points in F1-score. Results demonstrate its effectiveness and state-of-the-art performance in low-latency fault预警.

Technology Category

Application Category

📝 Abstract

This paper addresses the challenges of fault prediction and delayed response in distributed systems by proposing an intelligent prediction method based on temporal feature learning. The method takes multi-dimensional performance metric sequences as input. We use a Gated Recurrent Unit (GRU) to model the evolution of system states over time. An attention mechanism is then applied to enhance key temporal segments, improving the model's ability to identify potential faults. On this basis, a feedforward neural network is designed to perform the final classification, enabling early warning of system failures. To validate the effectiveness of the proposed approach, comparative experiments and ablation analyses were conducted using data from a large-scale real-world cloud system. The experimental results show that the model outperforms various mainstream time-series models in terms of Accuracy, F1-Score, and AUC. This demonstrates strong prediction capability and stability. Furthermore, the loss function curve confirms the convergence and reliability of the training process. It indicates that the proposed method effectively learns system behavior patterns and achieves efficient fault detection.

Problem

Research questions and friction points this paper is trying to address.

Proactive fault prediction in distributed systems using deep learning

Modeling system state evolution with GRU and attention mechanisms

Improving fault detection accuracy and early warning capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GRU for temporal state evolution modeling

Applies attention mechanism to highlight key segments

Combines feedforward network for final fault classification

🔎 Similar Papers

No similar papers found.