UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the urgent demand for high-bandwidth, low-latency interconnects in large language model (LLM) training, this paper proposes UB-Mesh—a hierarchical, localized nD-FullMesh datacenter network architecture. Methodologically, it introduces: (1) a modular UB-Mesh-Pod design based on 4D-FullMesh; (2) a Unified Bus (UB) enabling dynamic I/O bandwidth allocation and hardware resource pooling; (3) All-Path-Routing (APR) for multi-path forwarding and a 64+1 redundancy mechanism for fault tolerance; and (4) topology-aware scheduling combined with short-distance direct links to enhance data locality. Experimental evaluation demonstrates that UB-Mesh achieves a 2.04× improvement in cost efficiency over conventional Clos networks, increases network availability by 7.2%, and attains >95% linear scalability in LLM training. These results underscore UB-Mesh’s effectiveness in supporting scalable, reliable, and cost-efficient distributed AI training.

Technology Category

Application Category

📝 Abstract
As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture designed to enhance scalability, performance, cost-efficiency and availability. Unlike traditional datacenters that provide symmetrical node-to-node bandwidth, UB-Mesh employs a hierarchically localized nD-FullMesh network topology. This design fully leverages the data locality of LLM training, prioritizing short-range, direct interconnects to minimize data movement distance and reduce switch usage. Although UB-Mesh's nD-FullMesh topology offers several theoretical advantages, its concrete architecture design, physical implementation and networking system optimization present new challenges. For the actual construction of UB-Mesh, we first design the UB-Mesh-Pod architecture, which is based on a 4D-FullMesh topology. UB-Mesh-Pod is implemented via a suite of hardware components that serve as the foundational building blocks, including specifically-designed NPU, CPU, Low-Radix-Switch (LRS), High-Radix-Switch (HRS), NICs and others. These components are interconnected via a novel Unified Bus (UB) technique, which enables flexible IO bandwidth allocation and hardware resource pooling. For networking system optimization, we propose advanced routing mechanism named All-Path-Routing (APR) to efficiently manage data traffic. These optimizations, combined with topology-aware performance enhancements and robust reliability measures like 64+1 backup design, result in 2.04x higher cost-efficiency, 7.2% higher network availability compared to traditional Clos architecture and 95%+ linearity in various LLM training tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing scalability and performance for large-scale AI datacenters
Reducing data movement distance in LLM training networks
Optimizing cost-efficiency and availability in datacenter topology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchically localized nD-FullMesh topology
Unified Bus technique for flexible IO
All-Path-Routing for efficient traffic management
🔎 Similar Papers
No similar papers found.
H
Heng Liao
Huawei
Bingyang Liu
Bingyang Liu
Tsinghua University
Computer Networks
X
Xianping Chen
Huawei
Zhigang Guo
Zhigang Guo
Huawei
C
Chuanning Cheng
Huawei
J
Jianbing Wang
Huawei
X
Xiangyu Chen
Huawei
Peng Dong
Peng Dong
上海交通大学
传感器网络、信息融合、非线性滤波、目标跟踪、飞行器导航制导与控制
Rui Meng
Rui Meng
Salesforce Research
Machine LearningNatural Language Processing
W
Wenjie Liu
Huawei
Z
Zhe Zhou
Huawei
Z
Ziyang Zhang
Huawei
Y
Yuhang Gai
Huawei
C
Cunle Qian
Huawei
Y
Yi Xiong
Huawei
Z
Zhongwu Cheng
Huawei
J
Jing Xia
Huawei
Y
Yu-Long Ma
Huawei
X
Xi Chen
Huawei
W
Wenhua Du
Huawei
S
Shizhong Xiao
Huawei
C
Chungang Li
Huawei
Y
Yong Qin
Huawei
L
Liudong Xiong
Huawei
Z
Zhou Yu
Huawei
L
Lv Chen
Huawei
L
Lei Chen
Huawei
B
Buyun Wang
Huawei
P
Pei Wu
Huawei
J
Junen Gao
Huawei
X
Xiao-Chun Li
Huawei
J
Jian He
Huawei
S
Shizhuan Yan
Huawei
B
Bill McColl
Huawei