Direct Feature Access -- Scaling Network Traffic Feature Collection to Terabit Speed

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of real-time, fine-grained feature extraction from encrypted traffic on Tbps-scale links and the high latency (>100 ms) inherent in traditional CPU-centric architectures, this paper proposes a data-plane-native feature extraction paradigm with direct GPU integration. Specifically, flow-level features are extracted at line rate on a P4-programmable switch (Intel Tofino), and subsequently transferred to GPUs via RDMA and GPUDirect RDMAโ€”enabling zero-copy, zero-CPU-intervention delivery. GPU-accelerated feature processing and AI inference are performed synchronously. This design achieves, for the first time, an end-to-end sub-20-ms (<20 ms) closed-loop analysis latency, with a per-port throughput of 31 million feature vectors per second and support for 524,000 concurrent flows. By bypassing control-plane bottlenecks, the system delivers scalable, ML-driven real-time network monitoring for ultra-high-speed networks.

Technology Category

Application Category

๐Ÿ“ Abstract
Real-time traffic monitoring is critical for network operators to ensure performance, security, and visibility, especially as encryption becomes the norm. AI and ML have emerged as powerful tools to create deeper insights from network traffic, but collecting the fine-grained features needed at terabit speeds remains a major bottleneck. We introduce Direct Feature Access (DFA): a high-speed telemetry system that extracts flow features at line rate using P4-programmable data planes, and delivers them directly to GPUs via RDMA and GPUDirect, completely bypassing the ML server's CPU. DFA enables feature enrichment and immediate inference on GPUs, eliminating traditional control plane bottlenecks and dramatically reducing latency. We implement DFA on Intel Tofino switches and NVIDIA A100 GPUs, achieving extraction and delivery of over 31 million feature vectors per second, supporting 524,000 flows within sub-20 ms monitoring periods, on a single port. DFA unlocks scalable, real-time, ML-driven traffic analysis at terabit speeds, pushing the frontier of what is possible for next-generation network monitoring.
Problem

Research questions and friction points this paper is trying to address.

Collecting fine-grained network traffic features at terabit speeds
Bypassing CPU bottlenecks for real-time ML-driven traffic analysis
Enabling scalable high-speed feature extraction and GPU inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

P4-programmable data planes for feature extraction
RDMA and GPUDirect for GPU bypass
Intel Tofino and NVIDIA A100 implementation
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lukas Froschauer
Faculty of Applied Computer Science, Deggendorf Institute of Technology, Deggendorf, Germany
J
Jonatan Langlet
EECS and Digital Futures, KTH Royal Institute of Technology, Stockholm, Sweden
Andreas Kassler
Andreas Kassler
Karlstad University, Deggendorf Institute of Technology
Programmable NetworksNetwork ProgrammabilityNetwork VirtualizationSDN/NFVData Center