🤖 AI Summary
This paper addresses the label imbalance problem in network intrusion alert classification within automated Security Operations Centers (SOCs), systematically examining its dual adverse impact on both classification performance and explanation fidelity of the state-of-the-art method DeepCASE. We find that class imbalance not only degrades classification accuracy but also substantially undermines the reliability of model explanations. To mitigate this, we propose a data quality enhancement method that leverages existing SOC detection rules to guide synthetic sample generation and label correction—requiring no architectural modification to DeepCASE. Experiments demonstrate that our approach significantly improves both classification accuracy and explanation consistency while preserving DeepCASE’s original framework. The results validate “rule-driven data governance” as an effective strategy for enhancing the robustness and interpretability of automated alert classification, offering a novel pathway to balance security efficacy with operational trustworthiness.
📝 Abstract
Automation in Security Operations Centers (SOCs) plays a prominent role in alert classification and incident escalation. However, automated methods must be robust in the presence of imbalanced input data, which can negatively affect performance. Additionally, automated methods should make explainable decisions. In this work, we evaluate the effect of label imbalance on the classification of network intrusion alerts. As our use-case we employ DeepCASE, the state-of-the-art method for automated alert classification. We show that label imbalance impacts both classification performance and correctness of the classification explanations offered by DeepCASE. We conclude tuning the detection rules used in SOCs can significantly reduce imbalance and may benefit the performance and explainability offered by alert post-processing methods such as DeepCASE. Therefore, our findings suggest that traditional methods to improve the quality of input data can benefit automation.