Memory-efficient Sketch Acceleration for Handling Large Network Flows on FPGAs

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Count-min Sketch (CMS) suffers from reduced accuracy and throughput bottlenecks in high-speed network traffic monitoring due to FPGA on-chip memory constraints. Method: We propose a hardware-friendly variable-bit-width counter architecture to significantly increase hash table capacity and suppress estimation error. We innovatively co-deploy P4-programmable data plane logic with AMD OpenNIC on the Alveo U280 FPGA, implementing a customized, deeply pipelined hardware architecture. Contribution/Results: Our system operates in real time at 100 Gbps line rate, achieving substantially lower overestimation error and 2.3× higher memory efficiency compared to baseline CMS implementations. End-to-end validation was conducted on the Open Cloud Testbed. To the best of our knowledge, this is the first work to realize a P4 + OpenNIC co-design framework for large-scale flow-table estimation, delivering a scalable, high-accuracy, high-throughput hardware solution for resource-constrained network measurement scenarios.

Technology Category

Application Category

📝 Abstract
Sketch-based algorithms for network traffic monitoring have drawn increasing interest in recent years due to their sub-linear memory efficiency and high accuracy. As the volume of network traffic grows, software-based sketch implementations cannot match the throughput of the incoming network flows. FPGA-based hardware sketch has shown better performance compared to software running on a CPU when handling these packets. Among the various sketch algorithms, Count-min sketch is one of the most popular and efficient. However, due to the limited amount of on-chip memory, the FPGA-based count-Min sketch accelerator suffers from performance drops as network traffic grows. In this work, we propose a hardware-friendly architecture with a variable width memory counter for count-min sketch. Our architecture provides a more compact design to store the sketch data structure effectively, allowing us to support larger hash tables and reduce overestimation errors. The design makes use of a P4-based programmable data plane and the AMD OpenNIC shell. The design is implemented and verified on the Open Cloud Testbed running on AMD Alveo U280s and can keep up with the 100 Gbit link speed.
Problem

Research questions and friction points this paper is trying to address.

Handling large network flows with limited FPGA memory
Improving Count-min sketch performance for high traffic
Reducing overestimation errors in network traffic monitoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

FPGA-based variable width memory counter
P4-based programmable data plane
AMD OpenNIC shell integration
🔎 Similar Papers
No similar papers found.