Improving MPI Error Detection and Repair with Large Language Models and Bug References

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the significant challenges in detecting and repairing bugs in MPI programs, which stem from their intricate inter-process communication mechanisms. Existing large language models (LLMs) generally lack sufficient knowledge of MPI-specific defects, leading to suboptimal performance. To overcome this limitation, the paper proposes a synergistic framework that integrates few-shot learning (FSL), chain-of-thought (CoT) reasoning, and retrieval-augmented generation (RAG), leveraging external MPI defect knowledge to guide LLMs toward precise bug localization and repair. The approach substantially improves bug detection accuracy—from 44% to 77%—and demonstrates consistent effectiveness and strong generalization across multiple state-of-the-art LLMs, offering a novel paradigm for debugging parallel programs.

Technology Category

Application Category

📝 Abstract

Message Passing Interface (MPI) is a foundational technology in high-performance computing (HPC), widely used for large-scale simulations and distributed training (e.g., in machine learning frameworks such as PyTorch and TensorFlow). However, maintaining MPI programs remains challenging due to their complex interplay among processes and the intricacies of message passing and synchronization. With the advancement of large language models like ChatGPT, it is tempting to adopt such technology for automated error detection and repair. Yet, our studies reveal that directly applying large language models (LLMs) yields suboptimal results, largely because these models lack essential knowledge about correct and incorrect usage, particularly the bugs found in MPI programs. In this paper, we design a bug detection and repair technique alongside Few-Shot Learning (FSL), Chain-of-Thought (CoT) reasoning, and Retrieval Augmented Generation (RAG) techniques in LLMs to enhance the large language model's ability to detect and repair errors. Surprisingly, such enhancements lead to a significant improvement, from 44% to 77%, in error detection accuracy compared to baseline methods that use ChatGPT directly. Additionally, our experiments demonstrate our bug referencing technique generalizes well to other large language models.

Problem

Research questions and friction points this paper is trying to address.

MPI

error detection

bug repair

large language models

high-performance computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Learning

Chain-of-Thought

Retrieval Augmented Generation