🤖 AI Summary
This work addresses the challenges of design space exploration (DSE) for GPU architectures targeting modern AI workloads such as large language model (LLM) inference, which involve an enormous search space, high simulation costs, and complex multi-objective optimization. For the first time, LLMs are integrated into GPU architecture exploration to automatically extract architectural knowledge from simulator code, enabling the construction and dynamic refinement of design rules. These rules, combined with sensitivity analysis and bottleneck identification, efficiently guide the optimization process. The study introduces the first DSE benchmark featuring three core capability evaluations and implements a mechanism for automatic rule generation and correction. Within a design space of 4.7 million configurations, the approach identifies six designs surpassing the NVIDIA A100 in just 20 exploration steps, achieving a 17.5× improvement in exploration efficiency and a 32.9% gain in Pareto hypervolume over baseline methods.
📝 Abstract
GPU design space exploration (DSE) for modern AI workloads, such as Large-Language Model (LLM) inference, is challenging because of GPUs'vast, multi-modal design spaces, high simulation costs, and complex design optimization objectives (e.g. performance, power and area trade-offs). Existing automated DSE methods are often prohibitively expensive, either requiring an excessive number of exploration samples or depending on intricate, manually crafted analyses of interdependent critical paths guided by human heuristics. We present LUMINA, an LLM-driven GPU architecture exploration framework that leverage AI to enhance the DSE efficiency and efficacy for GPUs. LUMINA extracts architectural knowledge from simulator code and performs sensitivity studies to automatically compose DSE rules,which are auto-corrected during exploration. A core component of LUMINA is a DSE Benchmark that comprehensively evaluates and enhances LLMs'capabilities across three fundamental skills required for architecture optimization, which provides a principled and reproducible basis for model selection and ensuring consistent architectural reasoning. In the design space with 4.7 million possible samples, LUMINA identifies 6 designs of better performance and area than an A100 GPU efficiently, using only 20 steps via LLM-assisted bottleneck analysis. In comparison, LUMINA achieves 17.5x higher than design space exploration efficiency, and 32.9% better designs (i.e. Pareto Hypervolume) than Machine-Learning baselines, showcasing its ability to deliver high-quality design guidance with minimal search cost.