Direct Feature Access -- Scaling Network Traffic Feature Collection to Terabit Speed

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address the challenges of real-time, fine-grained feature extraction from encrypted traffic on Tbps-scale links and the high latency (>100 ms) inherent in traditional CPU-centric architectures, this paper proposes a data-plane-native feature extraction paradigm with direct GPU integration. Specifically, flow-level features are extracted at line rate on a P4-programmable switch (Intel Tofino), and subsequently transferred to GPUs via RDMA and GPUDirect RDMA—enabling zero-copy, zero-CPU-intervention delivery. GPU-accelerated feature processing and AI inference are performed synchronously. This design achieves, for the first time, an end-to-end sub-20-ms (<20 ms) closed-loop analysis latency, with a per-port throughput of 31 million feature vectors per second and support for 524,000 concurrent flows. By bypassing control-plane bottlenecks, the system delivers scalable, ML-driven real-time network monitoring for ultra-high-speed networks.

Technology Category

Application Category

📝 Abstract

Real-time traffic monitoring is critical for network operators to ensure performance, security, and visibility, especially as encryption becomes the norm. AI and ML have emerged as powerful tools to create deeper insights from network traffic, but collecting the fine-grained features needed at terabit speeds remains a major bottleneck. We introduce Direct Feature Access (DFA): a high-speed telemetry system that extracts flow features at line rate using P4-programmable data planes, and delivers them directly to GPUs via RDMA and GPUDirect, completely bypassing the ML server's CPU. DFA enables feature enrichment and immediate inference on GPUs, eliminating traditional control plane bottlenecks and dramatically reducing latency. We implement DFA on Intel Tofino switches and NVIDIA A100 GPUs, achieving extraction and delivery of over 31 million feature vectors per second, supporting 524,000 flows within sub-20 ms monitoring periods, on a single port. DFA unlocks scalable, real-time, ML-driven traffic analysis at terabit speeds, pushing the frontier of what is possible for next-generation network monitoring.

Problem

Research questions and friction points this paper is trying to address.

Collecting fine-grained network traffic features at terabit speeds

Bypassing CPU bottlenecks for real-time ML-driven traffic analysis

Enabling scalable high-speed feature extraction and GPU inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

P4-programmable data planes for feature extraction

RDMA and GPUDirect for GPU bypass

Intel Tofino and NVIDIA A100 implementation

🔎 Similar Papers

AutoFlow: An Autoencoder-based Approach for IP Flow Record Compression with Minimal Impact on Traffic Classification