🤖 AI Summary
This work addresses the poor generalization of federated learning in Internet of Things (IoT) environments, where high device heterogeneity and abundant unlabeled data pose significant challenges. To tackle this, the authors propose a novel approach integrating Clustered Federated Learning (CFL) with a Dual-Mode Micro-Architecture (DM²A). The method employs a shared encoder and a dual-branch structure to jointly perform unsupervised anomaly detection and supervised attack classification, while dynamically clustering devices with similar behavioral patterns to preserve local model characteristics. Notably, it is the first to combine clustering mechanisms with label-agnostic dual-task learning, effectively leveraging both labeled and unlabeled client data to mitigate model divergence. Experimental results demonstrate that under a setting where 80% of clients lack labels, the proposed method improves detection performance by 30% and reduces communication overhead by 50%, significantly outperforming existing approaches.
📝 Abstract
The rapid expansion of the Internet of Things (IoT) and Industrial IoT (IIoT) has created a massive, heterogeneous attack surface that challenges traditional network security mechanisms. While Federated Learning (FL) offers a privacy-preserving alternative to centralized Intrusion Detection Systems (IDS), standard approaches struggle to generalize across diverse device behaviors and typically fail to utilize the vast amounts of unlabeled data present in realistic edge environments. To bridge these gaps, we propose CLAD, a holistic framework that seamlessly incorporates Clustered Federated Learning (CFL) with a novel Dual-Mode Micro-Architecture ($\text{DM}^2\text{A}$). This unified approach simultaneously tackles the two primary bottlenecks of IoT security: device heterogeneity and label scarcity. The $\text{DM}^2\text{A}$ component features a shared encoder followed by two branches, enabling joint unsupervised anomaly detection and supervised attack classification; this allows the framework to harvest intelligence from both labeled and unlabeled clients. Concurrently, the clustering component dynamically groups devices with congruent traffic patterns, preventing global model divergence. By carefully combining these elements, CLAD ensures that no data is discarded and distinct operational patterns are preserved. Extensive evaluations demonstrate that this integrated approach significantly outperforms state-of-the-art baselines, achieving a 30% relative improvement in detection performance in scenarios with 80% unlabeled clients, with only half the communication cost.