A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the privacy-efficiency trade-off in malware detection for Internet-of-Things (IoT) crowdsourced sensing. To this end, we propose a decentralized federated learning (DFL)-based framework that preserves data locality while enabling collaborative model training. As foundational support, we introduce the first DFL-oriented IoT intrusion detection dataset, comprising 342,106 samples derived from 30-second sliding windows and featuring multi-source behavioral attributes—including system calls, file operations, and resource utilization—across benign applications and eight major malware families. Extensive experiments under diverse network topologies and non-IID data distributions demonstrate that DFL consistently outperforms centralized federated learning (CFL) in both model accuracy and communication efficiency, without compromising data privacy. Our contribution establishes a new paradigm for lightweight, scalable, and privacy-enhancing IoT security analytics, accompanied by a publicly available benchmark dataset and empirical evaluation framework.

Technology Category

Application Category

📝 Abstract

This paper introduces a dataset and experimental study for decentralized federated learning (DFL) applied to IoT crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware families. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 features used for model training and evaluation. Experiments on the DFL platform compare traditional machine learning (ML), centralized federated learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.

Problem

Research questions and friction points this paper is trying to address.

Detecting malware in IoT crowdsensing using decentralized federated learning

Evaluating DFL performance against traditional ML and centralized FL

Providing a dataset for IoT security research with behavioral records

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized federated learning for IoT malware detection

Behavioral records aggregated into 30-second windows

Comparative study of ML, CFL, and DFL performance

🔎 Similar Papers

No similar papers found.