🤖 AI Summary
To address the deployment trust deficit in deep learning–driven IoT network intrusion detection systems (DL-NIDS) stemming from their “black-box” nature, this study conducts the first cross-model and cross-method explainability evaluation. It integrates XAI techniques—including TRUSTEE and SHAP—with LSTM and CNN architectures, validated on the CIC-IoT-2023 and NSL-KDD multi-attack datasets, to systematically analyze decision rationales, salient features, and vulnerabilities arising from inductive biases. Key contributions include: (1) revealing substantial disparities across DL-NIDS in both explainability fidelity and bias robustness; (2) identifying severe explanation conflicts among mainstream XAI methods, undermining foundational trustworthiness; (3) proposing a novel model credibility metric grounded in explanation conflict degree; and (4) empirically demonstrating that several models exhibit heightened sensitivity to spurious correlations—exposing critical inductive bias vulnerabilities that compromise generalization and security assurance.
📝 Abstract
Network Intrusion Detection Systems (NIDSs) which use deep learning (DL) models achieve high detection performance and accuracy while avoiding dependence on fixed signatures extracted from attack artifacts. However, there is a noticeable hesitance among network security experts and practitioners when it comes to deploying DL-based NIDSs in real-world production environments due to their black-box nature, i.e., how and why the underlying models make their decisions. In this work, we analyze state-of-the-art DL-based NIDS models using explainable AI (xAI) techniques (e.g., TRUSTEE, SHAP) through extensive experiments with two different attack datasets. Using the explanations generated for the models' decisions, the most prominent features used by each NIDS model considered are presented. We compare the explanations generated across xAI methods for a given NIDS model as well as the explanations generated across the NIDS models for a given xAI method. Finally, we evaluate the vulnerability of each NIDS model to inductive bias (artifacts learnt from training data). The results show that: (1) some DL-based NIDS models can be better explained than other models, (2) xAI explanations are in conflict for most of the NIDS models considered in this work and (3) some NIDS models are more vulnerable to inductive bias than other models.