🤖 AI Summary
To address the lack of high-fidelity, high-efficiency co-simulation methodologies for chiplet-based systems, this paper proposes the first deep learning task co-simulation framework enabling microsecond-level power and transient thermal awareness. The framework unifies modeling of computation, on-chip interconnect (NoI) communication, dynamic power dissipation, and 3D thermal diffusion—accurately capturing congestion, pipeline effects, and electro-thermal coupling. It supports parallel DNN execution simulation across both homogeneous and heterogeneous chiplet configurations, as well as diverse interconnect topologies. Experimental evaluation demonstrates a 340% improvement in accuracy over conventional simulation approaches, while maintaining flexibility and scalability. By enabling rapid, physics-informed assessment of system-level performance and thermal behavior, the framework provides an efficient and reliable tool for chiplet architecture design and optimization.
📝 Abstract
Due to reduced manufacturing yields, traditional monolithic chips cannot keep up with the compute, memory, and communication demands of data-intensive applications, such as rapidly growing deep neural network (DNN) models. Chiplet-based architectures offer a cost-effective and scalable solution by integrating smaller chiplets via a network-on-interposer (NoI). Fast and accurate simulation approaches are critical to unlocking this potential, but existing methods lack the required accuracy, speed, and flexibility. To address this need, this work presents CHIPSIM, a comprehensive co-simulation framework designed for parallel DNN execution on chiplet-based systems. CHIPSIM concurrently models computation and communication, accurately capturing network contention and pipelining effects that conventional simulators overlook. Furthermore, it profiles the chiplet and NoI power consumptions at microsecond granularity for precise transient thermal analysis. Extensive evaluations with homogeneous/heterogeneous chiplets and different NoI architectures demonstrate the framework's versatility, up to 340% accuracy improvement, and power/thermal analysis capability.