🤖 AI Summary
This study systematically investigates common challenges, root causes, and mitigation strategies encountered by developers in open-source large language model (LLM) projects. Through a mixed-methods empirical analysis of 994 closed issues across 15 prominent open-source LLM repositories—including manual annotation, qualitative coding, and statistical validation—we identify three primary categories of issue origins: (1) intrinsic model deficiencies (“Model Problem”), (2) configuration and connectivity failures (“Configuration and Connection Problem”), and (3) inadequacies in feature implementation or method design (“Feature and Method Problem”). “Model Issue” emerges as the most prevalent category, with “Optimize Model” being the dominant resolution strategy. This work represents the first large-scale empirical investigation into debugging practices in open-source LLM development. It bridges a critical gap in the literature and provides data-driven insights to guide developer debugging workflows, toolchain design, and community support infrastructure optimization.
📝 Abstract
With the advancements of Large Language Models (LLMs), an increasing number of open-source software projects are using LLMs as their core functional component. Although research and practice on LLMs are capturing considerable interest, no dedicated studies explored the challenges faced by practitioners of LLM open-source projects, the causes of these challenges, and potential solutions. To fill this research gap, we conducted an empirical study to understand the issues that practitioners encounter when developing and using LLM open-source software, the possible causes of these issues, and potential solutions. We collected all closed issues from 15 LLM open-source projects and labelled issues that met our requirements. We then randomly selected 994 issues from the labelled issues as the sample for data extraction and analysis to understand the prevalent issues, their underlying causes, and potential solutions. Our study results show that (1) Model Issue is the most common issue faced by practitioners, (2) Model Problem, Configuration and Connection Problem, and Feature and Method Problem are identified as the most frequent causes of the issues, and (3) Optimize Model is the predominant solution to the issues. Based on the study results, we provide implications for practitioners and researchers of LLM open-source projects.