An Empirical Study on the Classification of Bug Reports with Machine Learning

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High misclassification rates (>30%) of software bug reports severely impede development efficiency. To address this, we construct a large-scale, heterogeneous dataset comprising over 660,000 cross-language and cross-platform bug reports from diverse issue-tracking systems (e.g., Jira, GitHub). We systematically evaluate the impact of programming language, issue-tracking platform, and textual features (title vs. description) on automated classification performance. Our empirical study is the first to demonstrate that both programming language and tracking system significantly affect model performance; we further show that training on heterogeneous data improves cross-project generalization, boosting accuracy by up to 12.7%. Surprisingly, title and description features yield statistically indistinguishable classification performance. SVM, logistic regression, and random forest exhibit robustness across multilingual and multi-platform settings. Finally, we derive practical, industry-oriented guidelines for model selection and training, providing a methodological foundation for accurate bug report classification.

Technology Category

Application Category

📝 Abstract
Software defects are a major threat to the reliability of computer systems. The literature shows that more than 30% of bug reports submitted in large software projects are misclassified (i.e., are feature requests, or mistakes made by the bug reporter), leading developers to place great effort in manually inspecting them. Machine Learning algorithms can be used for the automatic classification of issue reports. Still, little is known regarding key aspects of training models, such as the influence of programming languages and issue tracking systems. In this paper, we use a dataset containing more than 660,000 issue reports, collected from heterogeneous projects hosted in different issue tracking systems, to study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports. Results show that using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports; programming languages and issue tracking systems influence classification outcomes; and models based on heterogeneous projects can classify reports from projects not present during training. Based on findings, we propose guidelines for future research, including recommendations for using heterogeneous data and selecting high-performing algorithms.
Problem

Research questions and friction points this paper is trying to address.

Classify bug reports using machine learning to improve software reliability.
Study factors like programming languages and issue tracking systems affecting classification.
Propose guidelines for using heterogeneous data and selecting effective algorithms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Learning for bug report classification
Analysis of programming languages' impact on models
Use of heterogeneous data for training models
🔎 Similar Papers
No similar papers found.