๐ค AI Summary
This work proposes SKYLIGHT, a scalable three-dimensional photonic in-memory tensor core architecture designed for real-time AI inference, addressing the limitations of conventional electronic computing and existing photonic systems in scalability and reliability. By co-designing topology, wavelength routing, signal accumulation, and weight programming within a 3D-stacked framework, SKYLIGHT achieves high-efficiency photonic computation. Key innovations include a low-loss Si/SiN crossbar, thermally stable non-microring WDM components, a multi-port photodetector-based hierarchical accumulation scheme, and phase-change material weights enabling in-situ updates. A single core (144ร256) fabricated in a single lithography step delivers 342.1 TOPS (23.7 TOPS/W); on ResNet-50, it achieves 1212 FPS at 27 mJ per image, with an end-to-end energy efficiency of 84.17 FPS/Wโsignificantly outperforming the NVIDIA RTX PRO 6000 Blackwell GPU while maintaining high inference accuracy.
๐ Abstract
The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony for system-level modeling, we show that a single 144 x 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61 x higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.