FENIX: Enabling In-Network DNN Inference with FPGA-Enhanced Programmable Switches

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Existing ML-based network data-plane solutions (e.g., FlowLens, N3IC, BoS) struggle to simultaneously achieve low latency, high throughput, and high accuracy. This paper proposes FENIX, a hybrid architecture that introduces a dynamic data engine to regulate feature-stream rates, enabling precise performance matching between heterogeneous hardware—namely, Tofino ASICs and ZU19EG FPGAs—and resolving their inherent throughput mismatch. FENIX integrates in-network feature extraction with optimized DNN inference, facilitating deployment of complex models directly within the data plane. Key innovations include a probabilistic token-bucket scheduler, hardware-coordinated inference pipelining, and joint model-hardware optimization. Experimental evaluation demonstrates sub-microsecond end-to-end latency, throughput exceeding 1 Tbps, hardware overhead below 1%, and 95.2% classification accuracy on real-world traffic—significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) is increasingly used in network data planes for advanced traffic analysis. However, existing solutions (such as FlowLens, N3IC, and BoS) still struggle to simultaneously achieve low latency, high throughput, and high accuracy. To address these challenges, we present FENIX, a hybrid in-network ML system that performs feature extraction on programmable switch ASICs and deep neural network inference on FPGAs. FENIX introduces a Data Engine that leverages a probabilistic token bucket algorithm to control the sending rate of feature streams, effectively addressing the throughput gap between programmable switch ASICs and FPGAs. In addition, FENIX designs a Model Engine to enable high-accuracy deep neural network inference in the network, overcoming the difficulty of deploying complex models on resource-constrained switch chips. We implement FENIX on a programmable switch platform that integrates a Tofino ASIC and a ZU19EG FPGA directly and evaluate it on real-world network traffic datasets. Our results show that FENIX achieves microsecond-level inference latency and multi-terabit throughput with low hardware overhead, and delivers over 95% accuracy on mainstream network traffic classification tasks, outperforming SOTA.

Problem

Research questions and friction points this paper is trying to address.

Achieving low latency, high throughput, and high accuracy in-network ML

Bridging throughput gap between switch ASICs and FPGAs for feature streams

Enabling complex DNN inference on resource-constrained switch chips

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid in-network ML system with ASICs and FPGAs

Probabilistic token bucket for feature stream control

Model Engine enables high-accuracy DNN inference

🔎 Similar Papers

No similar papers found.