Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This study addresses the inefficiency and high cost of manual assignment in handling massive volumes of software defect reports by proposing MNAL, a human-in-the-loop cross-project defect identification framework. MNAL integrates neural language models, active learning, pseudo-labeling, and cross-project transfer learning in a model-agnostic manner and introduces a novel mutual-benefit mechanism that simultaneously enhances machine identification performance and the readability of human annotations during iterative model refinement. Empirical evaluations demonstrate that MNAL substantially outperforms state-of-the-art methods, reducing annotation readability burden by up to 95.8% and identification effort by 196.0% on large-scale datasets while significantly improving accuracy. User studies further confirm its advantages in efficiency, time savings, and cost-effectiveness.

Technology Category

Application Category

📝 Abstract
Bug reports, encompassing a wide range of bug types, are crucial for maintaining software quality. However, the increasing complexity and volume of bug reports pose a significant challenge in sole manual identification and assignment to the appropriate teams for resolution, as dealing with all the reports is time-consuming and resource-intensive. In this paper, we introduce a cross-project framework, dubbed Mutualistic Neural Active Learning (MNAL), designed for automated and more effective identification of bug reports from GitHub repositories boosted by human-machine collaboration. MNAL utilizes a neural language model that learns and generalizes reports across different projects, coupled with active learning to form neural active learning. A distinctive feature of MNAL is the purposely crafted mutualistic relation between the machine learners (neural language model) and human labelers (developers) when enriching the knowledge learned. That is, the most informative human-labeled reports and their corresponding pseudo-labeled ones are used to update the model while those reports that need to be labeled by developers are more readable and identifiable, thereby enhancing the human-machine teaming therein. We evaluate MNAL using a large scale dataset against the SOTA approaches, baselines, and different variants. The results indicate that MNAL achieves up to 95.8% and 196.0% effort reduction in terms of readability and identifiability during human labeling, respectively, while resulting in a better performance in bug report identification. Additionally, our MNAL is model-agnostic since it is capable of improving the model performance with various underlying neural language models. To further verify the efficacy of our approach, we conducted a qualitative case study involving 10 human participants, who rate MNAL as being more effective while saving more time and monetary resources.
Problem

Research questions and friction points this paper is trying to address.

bug report identification
human-machine collaboration
active learning
software quality
GitHub repositories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutualistic Neural Active Learning
Bug Report Identification
Human-Machine Collaboration
Active Learning
Neural Language Model