A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the privacy-efficiency trade-off in malware detection for Internet-of-Things (IoT) crowdsourced sensing. To this end, we propose a decentralized federated learning (DFL)-based framework that preserves data locality while enabling collaborative model training. As foundational support, we introduce the first DFL-oriented IoT intrusion detection dataset, comprising 342,106 samples derived from 30-second sliding windows and featuring multi-source behavioral attributes—including system calls, file operations, and resource utilization—across benign applications and eight major malware families. Extensive experiments under diverse network topologies and non-IID data distributions demonstrate that DFL consistently outperforms centralized federated learning (CFL) in both model accuracy and communication efficiency, without compromising data privacy. Our contribution establishes a new paradigm for lightweight, scalable, and privacy-enhancing IoT security analytics, accompanied by a publicly available benchmark dataset and empirical evaluation framework.

Technology Category

Application Category

📝 Abstract
This paper introduces a dataset and experimental study for decentralized federated learning (DFL) applied to IoT crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware families. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 features used for model training and evaluation. Experiments on the DFL platform compare traditional machine learning (ML), centralized federated learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.
Problem

Research questions and friction points this paper is trying to address.

Detecting malware in IoT crowdsensing using decentralized federated learning
Evaluating DFL performance against traditional ML and centralized FL
Providing a dataset for IoT security research with behavioral records
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized federated learning for IoT malware detection
Behavioral records aggregated into 30-second windows
Comparative study of ML, CFL, and DFL performance
🔎 Similar Papers
No similar papers found.
Chao Feng
Chao Feng
University of Zurich
networkmachine learningcybersecurity
Alberto Huertas Celdran
Alberto Huertas Celdran
University of Murcia
CybersecurityBrain-Computer InterfacesFederated LearningTrusted AI
Jing Han
Jing Han
University of Cambridge
deep learningaudio signal processingmachine learningmHealthaffective computing
H
Heqing Ren
Communication Systems Group, Department of Informatics, University of Zurich UZH, CH–8050 Zürich, Switzerland
X
Xi Cheng
Communication Systems Group, Department of Informatics, University of Zurich UZH, CH–8050 Zürich, Switzerland
Z
Zien Zeng
Communication Systems Group, Department of Informatics, University of Zurich UZH, CH–8050 Zürich, Switzerland
Lucas Krauter
Lucas Krauter
University of Zurich
Natural Language ProcessingMachine LearningSoftware Engineering
G
Gerome Bovet
Cyber-Defence Campus, armasuisse Science & Technology, CH–3602 Thun, Switzerland
B
Burkhard Stiller
Communication Systems Group, Department of Informatics, University of Zurich UZH, CH–8050 Zürich, Switzerland