Handling Large-Scale Network Flow Records: A Comparative Study on Lossy Compression

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing storage pressure on ISP-scale network flow record systems, this work investigates efficient lossy compression strategies that preserve utility for critical analytical tasks—specifically, server name prediction. We systematically evaluate scalar quantization, principal component analysis (PCA), and vector quantization on real campus-network flow data, integrating entropy coding and supervised learning to assess the compression–utility trade-off. Results demonstrate that scalar quantization achieves the highest prediction accuracy at high compression ratios and remains fully compatible with entropy coding; PCA, while reducing dimensionality, degrades subsequent entropy coding efficiency; and vector quantization suffers from poor scalability due to high-dimensional sparsity in flow features. This study provides the first empirical evidence establishing scalar quantization as the optimal practical compression paradigm for network flow data—yielding a lightweight, deployable, and utility-preserving solution for large-scale network monitoring.

Technology Category

Application Category

📝 Abstract
Flow records, that summarize the characteristics of traffic flows, represent a practical and powerful way to monitor a network. While they already offer significant compression compared to full packet captures, their sheer volume remains daunting, especially for large Internet Service Providers (ISPs). In this paper, we investigate several lossy compression techniques to further reduce storage requirements while preserving the utility of flow records for key tasks, such as predicting the domain name of contacted servers. Our study evaluates scalar quantization, Principal Component Analysis (PCA), and vector quantization, applied to a real-world dataset from an operational campus network. Results reveal that scalar quantization provides the best tradeoff between compression and accuracy. PCA can preserve predictive accuracy but hampers subsequent entropic compression, and while vector quantization shows promise, it struggles with scalability due to the high-dimensional nature of the data. These findings result in practical strategies for optimizing flow record storage in large-scale monitoring scenarios.
Problem

Research questions and friction points this paper is trying to address.

Reducing storage for large-scale network flow records
Evaluating lossy compression techniques for flow data
Balancing compression efficiency and data utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses scalar quantization for optimal compression
Applies PCA to maintain predictive accuracy
Explores vector quantization despite scalability issues
🔎 Similar Papers
No similar papers found.
G
Gabriele Merlach
University of Trieste
D
Damiano Ravalico
University of Trieste
Martino Trevisan
Martino Trevisan
Associate Professor, University of Trieste
Network MeasurementsOnline Social NetworksData Privacy
F
Fabio Palmese
Politecnico di Milano
G
Giovanni Baccichet
Politecnico di Milano
Alessandro E. C. Redondi
Alessandro E. C. Redondi
Associate Professor, Politecnico di Milano
Network Data AnalysisInternet of ThingsWireless Networks