Parendi: Thousand-Way Parallel RTL Simulation

📅 2024-03-07

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Traditional single-threaded RTL simulation suffers severe performance bottlenecks as chip complexity grows, limiting verification scalability. Method: This paper proposes a cycle-accurate RTL simulation paradigm tailored for thousand-core parallelism, built upon the Graphcore IPU architecture. It introduces a novel fine-grained RTL graph partitioning scheme and a dedicated compiler, along with a lightweight synchronization protocol and communication optimization mechanisms. Contribution/Results: The approach achieves the first-ever 5,888-core massively parallel RTL simulation. A systematic quantitative analysis isolates synchronization, communication, and computation overheads. Evaluated on a 4-IPU system, it delivers up to 4× speedup over state-of-the-art x86-based multi-core RTL simulators. This work establishes a scalable, hardware-accelerated parallelization pathway for ultra-large-scale hardware verification.

Technology Category

Application Category

📝 Abstract

Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core performance. Parendi is an RTL simulator that addresses this challenge by exploiting the abundant fine-grained parallelism inherent in RTL simulation and efficiently mapping it onto the massively parallel Graphcore IPU (Intelligence Processing Unit) architecture. Parendi scales up to 5888 cores on 4 Graphcore IPU sockets. It allows us to run large RTL designs up to 4$ imes$ faster than the most powerful state-of-the-art x64 multicore systems. To achieve this performance, we developed new partitioning and compilation techniques and carefully quantified the synchronization, communication, and computation costs of parallel RTL simulation: The paper comprehensively analyzes these factors and details the strategies that Parendi uses to optimize them.

Problem

Research questions and friction points this paper is trying to address.

Addresses stagnation in single-core RTL simulation performance

Exploits fine-grained parallelism for efficient RTL simulation

Optimizes synchronization, communication, and computation costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploits fine-grained parallelism in RTL simulation

Maps simulation onto Graphcore IPU architecture

Develops partitioning and compilation optimization techniques

🔎 Similar Papers

No similar papers found.

Nvidia

184,000 USD - 287,500 USD

US, CA, Santa Clara / US, TX, Austin / US, OR, Hillsboro

Authors to Follow