TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Current TPU design faces bottlenecks including heavy reliance on expert knowledge, high manual labor costs, and scarcity of domain-specific training data. To address these challenges, this paper proposes the first LLM-based automated TPU generation framework, specifically targeting systolic array architectures and enabling end-to-end hardware synthesis—from high-level specifications to synthesizable RTL. Key contributions are: (1) the first open-source, high-quality hardware-specific dataset for training and evaluation; (2) a hardware-semantic-aware RAG mechanism that substantially mitigates LLM hallucination; and (3) integrated modeling of systolic arrays, optimized approximate multiply-accumulate units, and automated hardware pipelining. Experimental results demonstrate that the generated TPUs achieve, on average, 92% reduction in area and 96% reduction in power consumption compared to manually optimized baselines, with significant improvements in energy efficiency and overall PPA (performance–power–area).

Technology Category

Application Category

📝 Abstract

The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless, designing optimal TPU remains challenging due to the high domain expertise level, considerable manual design time, and lack of high-quality, domain-specific datasets. This paper introduces TPU-Gen, the first Large Language Model (LLM) based framework designed to automate the exact and approximate TPU generation process, focusing on systolic array architectures. TPU-Gen is supported with a meticulously curated, comprehensive, and open-source dataset that covers a wide range of spatial array designs and approximate multiply-and-accumulate units, enabling design reuse, adaptation, and customization for different DNN workloads. The proposed framework leverages Retrieval-Augmented Generation (RAG) as an effective solution for a data-scare hardware domain in building LLMs, addressing the most intriguing issue, hallucinations. TPU-Gen transforms high-level architectural specifications into optimized low-level implementations through an effective hardware generation pipeline. Our extensive experimental evaluations demonstrate superior performance, power, and area efficiency, with an average reduction in area and power of 92% and 96% from the manual optimization reference values. These results set new standards for driving advancements in next-generation design automation tools powered by LLMs.

Problem

Research questions and friction points this paper is trying to address.

Automates TPU generation using LLMs for DNN workloads

Addresses lack of domain-specific datasets in hardware design

Reduces manual design time and expertise for TPU optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven TPU generation framework

Open-source dataset for TPU designs

RAG-enhanced hardware generation pipeline

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation

2024-10-04arXiv.orgCitations: 5

Nvidia

The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits.

US, CA, Santa Clara / US, TX, Austin

Authors to Follow