🤖 AI Summary
To address alert overload in network intrusion detection systems (NIDS) and the limited interpretability of deep learning models, this paper proposes an LSTM-based framework for automated alert prioritization. It conducts the first systematic empirical evaluation of four XAI methods—LIME, SHAP, Integrated Gradients, and DeepLIFT—on real-world Security Operations Center (SOC) alert logs. The study introduces a novel multidimensional XAI evaluation framework assessing fidelity, complexity, robustness, and reliability. Experimental results demonstrate that DeepLIFT consistently achieves superior performance across all metrics. Crucially, its feature attributions align closely with domain-expert security analysts’ judgments, significantly enhancing model trustworthiness and operational utility. This work establishes a reproducible, empirically grounded evaluation paradigm for explainable AI in network threat response, bridging the gap between XAI research and practical cybersecurity deployment.
📝 Abstract
A Network Intrusion Detection System (NIDS) monitors networks for cyber attacks and other unwanted activities. However, NIDS solutions often generate an overwhelming number of alerts daily, making it challenging for analysts to prioritize high-priority threats. While deep learning models promise to automate the prioritization of NIDS alerts, the lack of transparency in these models can undermine trust in their decision-making. This study highlights the critical need for explainable artificial intelligence (XAI) in NIDS alert classification to improve trust and interpretability. We employed a real-world NIDS alert dataset from Security Operations Center (SOC) of TalTech (Tallinn University Of Technology) in Estonia, developing a Long Short-Term Memory (LSTM) model to prioritize alerts. To explain the LSTM model's alert prioritization decisions, we implemented and compared four XAI methods: Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), Integrated Gradients, and DeepLIFT. The quality of these XAI methods was assessed using a comprehensive framework that evaluated faithfulness, complexity, robustness, and reliability. Our results demonstrate that DeepLIFT consistently outperformed the other XAI methods, providing explanations with high faithfulness, low complexity, robust performance, and strong reliability. In collaboration with SOC analysts, we identified key features essential for effective alert classification. The strong alignment between these analyst-identified features and those obtained by the XAI methods validates their effectiveness and enhances the practical applicability of our approach.