🤖 AI Summary
The growing demand for efficient, domain-specific accelerators for large language model (LLM) inference necessitates a systematic understanding of hardware-software co-design trade-offs. Method: This paper conducts a comprehensive survey of mainstream commercial LLM accelerators, introducing the first evaluation framework covering inference efficiency, memory bandwidth utilization, and sparsity adaptability. It integrates microarchitectural feature extraction with compiler-level software-stack analysis, supported by architectural comparative studies, fine-grained performance modeling, and industrial case analyses. Contribution/Results: The study identifies six fundamental bottlenecks and distills twelve actionable design guidelines. It proposes three generations of evolutionary architectural principles and establishes the first unified benchmark and R&D roadmap specifically for LLM accelerators—offering both theoretical insight and practical engineering guidance for academia and industry.
📝 Abstract
With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing. This paper discusses the necessity of LLM accelerators and provides a comprehensive analysis of the hardware and software characteristics of the main commercial LLM accelerators. Based on this analysis, we propose considerations for the development of next-generation LLM accelerators and suggest future research directions.