🤖 AI Summary
This work addresses the urgent demand for high-performance interconnects in converged supercomputing and AI scenarios by developing a next-generation interconnect system, BXIv3, building upon the European BXI architecture, and outlining a roadmap toward BXIv4. Employing a hybrid development and co-design methodology, the project integrates commercial switching ASICs, custom IP cores, and FPGA-based NICs to construct a Technology Readiness Level (TRL) 8 prototype. Leveraging prior achievements such as RED-SEA, the system undergoes comprehensive validation across scientific computing, AI workloads, and standard benchmarks, demonstrating substantial improvements in scalability and communication performance. This interconnect solution is positioned to provide critical infrastructure support for exascale and post-exascale systems beyond 2025.
📝 Abstract
NET4EXA aims to develop a next-generation high-performance interconnect for HPC and AI systems, addressing the increasing demands of large-scale infrastructures, such as those required for training Large Language Models. Building upon the proven BXI (Bull eXascale Interconnect) European technology used in TOP15 supercomputers, NET4EXA will deliver the new BXI release, BXIv3, a complete hardware and software interconnect solution, including switch and network interface components. The project will integrate a fully functional pilot system at TRL 8, ready for deployment into upcoming exascale and post-exascale systems from 2025 onward. Leveraging prior research from European initiatives like RED-SEA, the previous achievements of consortium partners and over 20 years of expertise from BULL, NET4EXA also lays the groundwork for the future generation of BXI, BXIv4, providing analysis and preliminary design. The project will use a hybrid development and co-design approach, combining commercial switch technology with custom IP and FPGA-based NICs. Performances of NET4EXA BXIv3 interconnect will be evaluated using a broad portfolio of benchmarks, scientific scalable applications, and AI workloads.