🤖 AI Summary
To address the latency issue in fault prediction for distributed systems, this paper proposes a novel GRU-Attention-FFN cascaded model—first integrating self-attention into the GRU architecture to dynamically emphasize critical pre-fault time intervals. This design significantly enhances early detection of subtle anomalies under long-range temporal dependencies. The model synergistically combines gated recurrent units, learnable temporal attention, and a feed-forward network, optimized via binary classification loss with rigorous convergence analysis. Evaluated on a large-scale real-world cloud system dataset, the method achieves 94.2% accuracy and an AUC of 0.963, outperforming established baselines—including LSTM, TCN, and Prophet—by 12.6 percentage points in F1-score. Results demonstrate its effectiveness and state-of-the-art performance in low-latency fault预警.
📝 Abstract
This paper addresses the challenges of fault prediction and delayed response in distributed systems by proposing an intelligent prediction method based on temporal feature learning. The method takes multi-dimensional performance metric sequences as input. We use a Gated Recurrent Unit (GRU) to model the evolution of system states over time. An attention mechanism is then applied to enhance key temporal segments, improving the model's ability to identify potential faults. On this basis, a feedforward neural network is designed to perform the final classification, enabling early warning of system failures. To validate the effectiveness of the proposed approach, comparative experiments and ablation analyses were conducted using data from a large-scale real-world cloud system. The experimental results show that the model outperforms various mainstream time-series models in terms of Accuracy, F1-Score, and AUC. This demonstrates strong prediction capability and stability. Furthermore, the loss function curve confirms the convergence and reliability of the training process. It indicates that the proposed method effectively learns system behavior patterns and achieves efficient fault detection.