Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address the scarcity of high-quality security datasets for heterogeneous, low-power, and highly concurrent IoT environments, this paper introduces IoT-23—the first large-scale, distributed IoT traffic dataset. It captures network traffic from 78 emulated IoT devices, covering prevalent protocols including MQTT, CoAP, and RTSP, and encompasses both realistic benign traffic and full-lifecycle attack samples (e.g., DoS, Telnet brute-force, CoAP amplification, and C&C communications). A novel interface-level distributed capture mechanism ensures fine-grained, synchronized data acquisition, yielding dual-format labeled data: raw PCAP files and feature-engineered CSV files. Data collection and preprocessing were conducted on the Gotham testbed using tcpdump, tshark, and custom Python scripts, and the dataset is publicly released via Zenodo. IoT-23 significantly enhances the reproducibility, training efficacy, and generalization capability of intrusion detection systems in real-world IoT deployments.

Technology Category

Application Category

📝 Abstract

In this paper, a dataset of IoT network traffic is presented. Our dataset was generated by utilising the Gotham testbed, an emulated large-scale Internet of Things (IoT) network designed to provide a realistic and heterogeneous environment for network security research. The testbed includes 78 emulated IoT devices operating on various protocols, including MQTT, CoAP, and RTSP. Network traffic was captured in Packet Capture (PCAP) format using tcpdump, and both benign and malicious traffic were recorded. Malicious traffic was generated through scripted attacks, covering a variety of attack types, such as Denial of Service (DoS), Telnet Brute Force, Network Scanning, CoAP Amplification, and various stages of Command and Control (C&C) communication. The data were subsequently processed in Python for feature extraction using the Tshark tool, and the resulting data was converted to Comma Separated Values (CSV) format and labelled. The data repository includes the raw network traffic in PCAP format and the processed labelled data in CSV format. Our dataset was collected in a distributed manner, where network traffic was captured separately for each IoT device at the interface between the IoT gateway and the device. Our dataset was collected in a distributed manner, where network traffic was separately captured for each IoT device at the interface between the IoT gateway and the device. With its diverse traffic patterns and attack scenarios, this dataset provides a valuable resource for developing Intrusion Detection Systems and security mechanisms tailored to complex, large-scale IoT environments. The dataset is publicly available at Zenodo.

Problem

Research questions and friction points this paper is trying to address.

Creates a reproducible IoT dataset for security research

Includes diverse attack types for intrusion detection

Provides raw and processed data in PCAP and CSV formats

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emulated large-scale IoT network

Scripted attacks for malicious traffic

Distributed network traffic capture

🔎 Similar Papers

No similar papers found.