LLM&HPC:Benchmarking DeepSeek's Performance in High-Performance Computing Tasks

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of open-source large language models (LLMs) for multilingual high-performance computing (HPC) code generation. Method: We conduct the first comprehensive assessment of DeepSeek’s capabilities across four representative HPC kernels—conjugate gradient solvers, parallel heat equation solvers, DGEMM, and STREAM triad—in C++, Fortran, Julia, and Python. Code correctness, execution performance, and strong/weak scaling behavior are rigorously evaluated using MPI/OpenMP implementations, multi-scale problem sizes, and profiling via perf, time, and scaling analysis; results are benchmarked against GPT-4. Contribution/Results: DeepSeek generates syntactically correct and functionally viable HPC code across languages. However, it exhibits significantly inferior performance and scalability—particularly under large-scale parallel configurations and large-matrix workloads—compared to GPT-4. This reveals critical capability gaps in current open-source LLMs for production-grade HPC code synthesis, highlighting bottlenecks in parallel algorithm understanding, hardware-aware optimization, and scalability reasoning.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs), such as GPT-4 and DeepSeek, have been applied to a wide range of domains in software engineering. However, their potential in the context of High-Performance Computing (HPC) much remains to be explored. This paper evaluates how well DeepSeek, a recent LLM, performs in generating a set of HPC benchmark codes: a conjugate gradient solver, the parallel heat equation, parallel matrix multiplication, DGEMM, and the STREAM triad operation. We analyze DeepSeek's code generation capabilities for traditional HPC languages like Cpp, Fortran, Julia and Python. The evaluation includes testing for code correctness, performance, and scaling across different configurations and matrix sizes. We also provide a detailed comparison between DeepSeek and another widely used tool: GPT-4. Our results demonstrate that while DeepSeek generates functional code for HPC tasks, it lags behind GPT-4, in terms of scalability and execution efficiency of the generated code.

Problem

Research questions and friction points this paper is trying to address.

Evaluating DeepSeek's performance in HPC code generation

Comparing DeepSeek with GPT-4 on scalability and efficiency

Assessing code correctness and performance across HPC benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking DeepSeek in HPC tasks

Evaluating code generation for multiple languages

Comparing DeepSeek with GPT-4 performance

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Senior High-Performance LLM Training Engineer

Nvidia

base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5; equity and benefits

US, CA, Santa Clara

Authors to Follow