ENCODE: Encoding NetFlows for Network Anomaly Detection

📅 2022-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing NetFlow preprocessing for anomaly detection overlooks both feature value frequencies and local traffic context, limiting semantic expressiveness and clustering efficacy. To address this, we propose a novel context-aware encoding method that jointly models feature occurrence frequency and local flow context—departing from conventional static numerical mapping—and enables semantic representation and automatic clustering of network behavior. Our approach integrates a custom-designed encoding algorithm with diverse supervised and unsupervised models (e.g., Random Forest, Autoencoder), and is evaluated on a newly constructed Kubernetes attack NetFlow benchmark dataset alongside two public datasets. Experimental results demonstrate an average 12.6% improvement in F1-score across all models, with a peak AUC of 0.987—significantly outperforming One-Hot and Embedding baselines. The method enhances both fine-grained anomaly discrimination and generalization capability.
📝 Abstract
NetFlow data is a popular network log format used by many network analysts and researchers. The advantages of using NetFlow over deep packet inspection are that it is easier to collect and process, and it is less privacy intrusive. Many works have used machine learning to detect network attacks using NetFlow data. The first step for these machine learning pipelines is to pre-process the data before it is given to the machine learning algorithm. Many approaches exist to pre-process NetFlow data; however, these simply apply existing methods to the data, not considering the specific properties of network data. We argue that for data originating from software systems, such as NetFlow or software logs, similarities in frequency and contexts of feature values are more important than similarities in the value itself. In this work, we propose an encoding algorithm that directly takes the frequency and the context of the feature values into account when the data is being processed. Different types of network behaviours can be clustered using this encoding, thus aiding the process of detecting anomalies within the network. We train several machine learning models for anomaly detection using the data that has been encoded with our encoding algorithm. We evaluate the effectiveness of our encoding on a new dataset that we created for network attacks on Kubernetes clusters and two well-known public NetFlow datasets. We empirically demonstrate that the machine learning models benefit from using our encoding for anomaly detection.
Problem

Research questions and friction points this paper is trying to address.

Network Anomaly Detection
Stream Data
Attack Behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Frequency Context
Network Anomaly Detection
Machine Learning Optimization
🔎 Similar Papers
No similar papers found.