🤖 AI Summary
Traditional electrical interconnect architectures face critical bottlenecks in AI training—including insufficient bandwidth, high communication latency, and fragmented multi-tenant resource allocation. To address these challenges, this work proposes a rack-scale silicon photonics-based chiplet-direct compute architecture. Leveraging silicon photonics integration and chip-to-chip optical interconnects, it introduces a hardware-reconfigurable optical switching network—the first of its kind—enabling fragmentation-free, fine-grained, and strongly isolated multi-tenant resource slicing. Furthermore, distributed collective communication is co-optimized across the optical fabric, accelerating intra-rack collective operations by 74% and improving end-to-end large-model training throughput by 1.7×. This architecture overcomes fundamental bandwidth and latency limitations inherent to electrical interconnects, establishing a scalable, high-density, multi-tenant optical interconnect infrastructure paradigm for next-generation AI training systems.
📝 Abstract
We present a rack-scale compute architecture for ML using multi-accelerator servers connected via chip-to-chip silicon photonic components. Our architecture achieves (1) multi-tenanted resource slicing without fragmentation, (2) 74% faster rack-scale collective communication, and (3) 1.7X speedup in end-to-end ML training throughput.