🤖 AI Summary
Traditional sketches suffer from low accuracy in dynamic network flow mining, while ML-based optimization methods exhibit poor adaptability and high training overhead. Method: This paper proposes a novel two-tier flow sketch architecture integrating a large language model (LLM). It pioneers the use of lightweight fine-tuned LLMs in sketch design, leveraging non-ID packet-header fields (e.g., TTL, IP identification) for joint flow feature modeling. A decoupled two-tier structure is introduced: an upper tier classifies flows into heavy vs. light categories, while a lower tier performs regression-based flow size estimation—balancing memory efficiency and dynamic adaptability. Results: On three representative flow analysis tasks, the method achieves 7.5× higher accuracy than state-of-the-art approaches, reduces training cost by an order of magnitude, and significantly improves robustness to abrupt traffic distribution shifts.
📝 Abstract
Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for network stream mining. Recent studies attempt to optimize sketches using machine learning; however, these approaches face the challenges of lacking adaptivity to dynamic networks and incurring high training costs. In this paper, we propose LLM-Sketch, based on the insight that fields beyond the flow IDs in packet headers can also help infer flow sizes. By using a two-tier data structure and separately recording large and small flows, LLM-Sketch improves accuracy while minimizing memory usage. Furthermore, it leverages fine-tuned large language models (LLMs) to reliably estimate flow sizes. We evaluate LLM-Sketch on three representative tasks, and the results demonstrate that LLM-Sketch outperforms state-of-the-art methods by achieving a $7.5 imes$ accuracy improvement.