ChipLight: Cross-Layer Optimization of Chiplet Design with Optical Interconnects for LLM Training

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the growing communication bottleneck in large model training, where inter-device data transfer limits performance, necessitating coordinated bandwidth and efficiency improvements at both intra-package and inter-package levels. The paper proposes a cross-layer, multi-objective optimization framework that, for the first time, jointly models chiplet architecture, training parallelization strategies, and optical interconnect network topology to enable holistic co-design from chip to system scale. By integrating chiplet-based integration, optical interconnects, and parallelism modeling—and combining black-box and white-box approaches for efficient design space exploration—the method substantially enhances training throughput. The study demonstrates the feasibility and effectiveness of cross-layer co-design in communication-intensive scenarios, offering a new paradigm for future large-model training clusters.

Technology Category

Application Category

📝 Abstract
In large-scale distributed LLM training, communication between devices becomes the key performance bottleneck. Chiplet technology can integrate multiple dies into a package to scale-up node performance with higher bandwidth. Meanwhile, optical interconnect (OI) technology offers long-reach, high-bandwidth links, making it well suited for scale-out networks. The combination of these two technologies has the potential to overcome communication bottlenecks within and across packages. In this work, we present ChipLight, a cross-layer multi-objective design and optimization method for training clusters leveraging chiplet and OI. We first abstract an architecture model for such complex clusters, co-optimizing chiplet architecture, training parallel strategy, and OI network topology. Based on such models, we tailor the design space exploration flow by combining both black-box and white-box methodologies. Evaluated by our experimental results, ChipLight achieves significantly improved training efficiency and provides valuable design insights for the development of future training clusters.
Problem

Research questions and friction points this paper is trying to address.

LLM training
communication bottleneck
chiplet
optical interconnect
distributed training
Innovation

Methods, ideas, or system contributions that make the work stand out.

chiplet
optical interconnect
cross-layer optimization
LLM training
design space exploration
🔎 Similar Papers
No similar papers found.