🤖 AI Summary
To address the security challenge of IoT firmware version identification under scarce labeled data, this paper proposes a transfer learning–based Twin Neural Network (TNN) approach. First, statistical features extracted from network traffic are converted into grayscale images; then, a pre-trained TNN learns fine-grained similarity representations across firmware versions. Crucially, Hedges’ g effect size is introduced to quantify differences in similarity scores, enabling robust detection of minor version changes. This work is the first to apply twin-network transfer learning to firmware version identification, achieving generalization to unseen devices and versions with only a small number of labeled samples. Evaluated on a real-world multi-version firmware dataset spanning 12 IoT device classes, the method achieves 95.83% accuracy for version identification and 84.38% accuracy for version change detection.
📝 Abstract
As the Internet of Things (IoT) becomes more embedded within our daily lives, there is growing concern about the risk `smart' devices pose to network security. To address this, one avenue of research has focused on automated IoT device identification. Research has however largely neglected the identification of IoT device firmware versions. There is strong evidence that IoT security relies on devices being on the latest version patched for known vulnerabilities. Identifying when a device has updated (has changed version) or not (is on a stable version) is therefore useful for IoT security. Version identification involves challenges beyond those for identifying the model, type, and manufacturer of IoT devices, and traditional machine learning algorithms are ill-suited for effective version identification due to being limited by the availability of data for training. In this paper, we introduce an effective technique for identifying IoT device versions based on transfer learning. This technique relies on the idea that we can use a Twin Neural Network (TNN) - trained at distinguishing devices - to detect differences between a device on different versions. This facilitates real-world implementation by requiring relatively little training data. We extract statistical features from on-wire packet flows, convert these features into greyscale images, pass these images into a TNN, and determine version changes based on the Hedges' g effect size of the similarity scores. This allows us to detect the subtle changes present in on-wire traffic when a device changes version. To evaluate our technique, we set up a lab containing 12 IoT devices and recorded their on-wire packet captures for 11 days across multiple firmware versions. For testing data held out from training, our best performing model is shown to be 95.83% and 84.38% accurate at identifying stable versions and version changes respectively.