🤖 AI Summary
Addressing the dual challenges of label scarcity and data silos in network intrusion detection, this paper proposes the first unsupervised federated Intrusion Detection System (IDS) framework tailored for cybersecurity. Methodologically, it integrates federated learning with unsupervised deep clustering and introduces a novel differentially private federated K-means++ initialization mechanism—eliminating reliance on labeled data and preventing raw traffic sharing. Distributed clustering enables cross-institutional collaborative training while preserving local data privacy through model aggregation without data exchange. Experiments across multi-party settings demonstrate that the framework achieves only a <1.2% AUC degradation versus centralized baselines—significantly outperforming existing approaches—and provide the first empirical validation of unsupervised federated IDS feasibility in privacy-sensitive environments. Key contributions include: (i) the first unsupervised federated IDS architecture; (ii) a privacy-enhanced federated clustering initialization technique; and (iii) a label-free, data-agnostic collaborative detection paradigm.
📝 Abstract
Recent Intrusion Detection System (IDS) research has increasingly moved towards the adoption of machine learning methods. However, most of these systems rely on supervised learning approaches, necessitating a fully labeled training set. In the realm of network intrusion detection, the requirement for extensive labeling can become impractically burdensome. Moreover, while IDS training could benefit from inter-company knowledge sharing, the sensitive nature of cybersecurity data often precludes such cooperation. To address these challenges, we propose an IDS architecture that utilizes unsupervised learning to reduce the need for labeling. We further facilitate collaborative learning through the implementation of a federated learning framework. To enhance privacy beyond what current federated clustering models offer, we introduce an innovative federated K-means++ initialization technique. Our findings indicate that transitioning from a centralized to a federated setup does not significantly diminish performance.