Chip-to-chip photonic connectivity in multi-accelerator servers for ML

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional electrical interconnect architectures face critical bottlenecks in AI training—including insufficient bandwidth, high communication latency, and fragmented multi-tenant resource allocation. To address these challenges, this work proposes a rack-scale silicon photonics-based chiplet-direct compute architecture. Leveraging silicon photonics integration and chip-to-chip optical interconnects, it introduces a hardware-reconfigurable optical switching network—the first of its kind—enabling fragmentation-free, fine-grained, and strongly isolated multi-tenant resource slicing. Furthermore, distributed collective communication is co-optimized across the optical fabric, accelerating intra-rack collective operations by 74% and improving end-to-end large-model training throughput by 1.7×. This architecture overcomes fundamental bandwidth and latency limitations inherent to electrical interconnects, establishing a scalable, high-density, multi-tenant optical interconnect infrastructure paradigm for next-generation AI training systems.

Technology Category

Application Category

📝 Abstract
We present a rack-scale compute architecture for ML using multi-accelerator servers connected via chip-to-chip silicon photonic components. Our architecture achieves (1) multi-tenanted resource slicing without fragmentation, (2) 74% faster rack-scale collective communication, and (3) 1.7X speedup in end-to-end ML training throughput.
Problem

Research questions and friction points this paper is trying to address.

Resource Management
Communication Speed
Machine Learning Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Photonics Technology
Supercomputer Architecture
Machine Learning Acceleration
🔎 Similar Papers
No similar papers found.
A
Abhishek Vijaya Kumar
Cornell University
A
Arjun Devraj
Cornell University
Darius Bunandar
Darius Bunandar
Lightmatter, Inc.
Rachee Singh
Rachee Singh
Cornell University
NetworkingNetworked Systems