🤖 AI Summary
Existing sketch-based superspreader detection methods rely solely on estimating flow cardinality using full IP addresses, thereby ignoring communication patterns within subnets and resulting in high false-positive rates and low accuracy. Although hierarchical approaches can capture subnet-level cardinalities, they incur prohibitive memory overhead. To address these limitations, this work proposes SegSketch, a novel method that integrates a lightweight semi-segment hashing strategy to infer the length of shared IP prefixes and performs segmented cardinality estimation within subnets, effectively balancing communication locality and memory efficiency. Experimental results demonstrate that, under identical small-memory constraints, SegSketch achieves up to an 8.04× improvement in F1 score compared to state-of-the-art methods, significantly enhancing detection performance.
📝 Abstract
Accurately detecting super host that establishes connections to a large number of distinct peers is significant for mitigating web attacks and ensuring high quality of web service. Existing sketch-based approaches estimate the number of distinct connections called flow cardinality according to full IP addresses, while ignoring the fact that a malicious or victim super host often communicates with hosts within the same subnet, resulting in high false positive rates and low accuracy. Though hierarchical-structure based approaches could capture flow cardinality in subnet, they inherently suffer from high memory usage. To address these limitations, we propose SegSketch, a segmented cardinality estimation approach that employs a lightweight halved-segment hashing strategy to infer common prefix lengths of IP addresses, and estimates cardinality within subnet to enhance detection accuracy under constrained memory size. Experiments driven by real-world traces demonstrate that, SegSketch improves F1-Score by up to 8.04x compared to state-of-the-art solutions, particularly under small memory budgets.