ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Accelerator clusters face challenges in resource decoupling, poor heterogeneity interoperability, and inconsistent memory hierarchy management. Method: This paper proposes XLink-CXL, a hybrid interconnect architecture: XLink enables ultra-low-latency intra-cluster accelerator-to-accelerator direct communication, while a hierarchical CXL switching network establishes a two-tiered cross-cluster memory pool, supporting cache-coherent memory pooling and composable resource disaggregation. It introduces the first explicit memory hierarchy mechanism, leveraging CXL abstraction interfaces to overcome hardware-level heterogeneity interoperability barriers. Results: Experiments show that, compared to RDMA-based solutions, XLink-CXL achieves 1.22× average and 1.84× peak speedup for LLM training, and reduces latency by up to 4.5× for memory-intensive workloads. The architecture significantly enhances coordination efficiency and scalability in heterogeneous accelerator clusters.

Technology Category

Application Category

📝 Abstract

This paper proposes ScalePool, a novel cluster architecture designed to interconnect numerous accelerators using unified hardware interconnects rather than traditional long-distance networking. ScalePool integrates Accelerator-Centric Links (XLink) and Compute Express Link (CXL) into a unified XLink-CXL hybrid fabric. Specifically, ScalePool employs XLink for intra-cluster, low-latency accelerator communication, while using hierarchical CXL-based switching fabrics for scalable and coherent inter-cluster memory sharing. By abstracting interfaces through CXL, ScalePool structurally resolves interoperability constraints, enabling heterogeneous cluster operation and composable resource disaggregation. In addition, ScalePool introduces explicit memory tiering: the latency-critical tier-1 combines accelerator-local memory with coherence-centric CXL and XLink, whereas the highcapacity tier-2 employs dedicated memory nodes interconnected by a CXL-based fabric, achieving scalable and efficient memory pooling. Evaluation results show that ScalePool accelerates LLM training by 1.22x on average and up to 1.84x compared to conventional RDMA-based environments. Furthermore, the proposed tier-2 memory disaggregation strategy reduces latency by up to 4.5x for memory-intensive workloads.

Problem

Research questions and friction points this paper is trying to address.

Interconnecting accelerators with unified hardware instead of networking

Resolving interoperability constraints for heterogeneous cluster operation

Achieving scalable memory pooling with explicit tiering strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid XLink-CXL fabric for composable resource disaggregation

XLink enables low-latency intra-cluster accelerator communication

Hierarchical CXL switching enables scalable coherent memory sharing

🔎 Similar Papers

No similar papers found.