Chip-to-chip photonic connectivity in multi-accelerator servers for ML

📅 2025-01-30

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

Traditional electrical interconnect architectures face critical bottlenecks in AI training—including insufficient bandwidth, high communication latency, and fragmented multi-tenant resource allocation. To address these challenges, this work proposes a rack-scale silicon photonics-based chiplet-direct compute architecture. Leveraging silicon photonics integration and chip-to-chip optical interconnects, it introduces a hardware-reconfigurable optical switching network—the first of its kind—enabling fragmentation-free, fine-grained, and strongly isolated multi-tenant resource slicing. Furthermore, distributed collective communication is co-optimized across the optical fabric, accelerating intra-rack collective operations by 74% and improving end-to-end large-model training throughput by 1.7×. This architecture overcomes fundamental bandwidth and latency limitations inherent to electrical interconnects, establishing a scalable, high-density, multi-tenant optical interconnect infrastructure paradigm for next-generation AI training systems.

Technology Category

Application Category

📝 Abstract

We present a rack-scale compute architecture for ML using multi-accelerator servers connected via chip-to-chip silicon photonic components. Our architecture achieves (1) multi-tenanted resource slicing without fragmentation, (2) 74% faster rack-scale collective communication, and (3) 1.7X speedup in end-to-end ML training throughput.

Problem

Research questions and friction points this paper is trying to address.

Resource Management

Communication Speed

Machine Learning Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Photonics Technology

Supercomputer Architecture

Machine Learning Acceleration

🔎 Similar Papers

No similar papers found.