🤖 AI Summary
This work addresses the diverse performance and resource-efficiency requirements of emerging applications for network switches by proposing SPAC, a co-designed framework for automated FPGA-based switch generation that integrates protocol and architecture. SPAC leverages a domain-specific language, a modular high-level synthesis (HLS) component library, trajectory-aware design space exploration, and multi-fidelity simulation to enable joint protocol–microarchitecture optimization and efficient customization. Experimental results demonstrate that, compared to fixed-architecture approaches, SPAC reduces LUT usage by up to 55% and BRAM consumption by up to 53% across various workloads, while achieving latency improvements of 7.8%–38.4%, maintaining low packet loss rates, and incurring only modest resource overhead.
📝 Abstract
With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.