🤖 AI Summary
To address the low efficiency of manual defect report assignment in open-source projects and the limitations of existing automated approaches—namely sensitivity to label noise, the long-tail distribution of developer contributions, and dynamic developer activity—this paper proposes a multi-relational heterogeneous temporal graph modeling framework for defect assignment. We construct, for the first time, a heterogeneous temporal graph integrating five types of interactions among issues, developers, and code files. A time-slicing mechanism enables phase-aware dynamic representation learning, while a tripartite collaborative embedding scheme captures cross-entity semantic associations. To mitigate data bias, we curate a high-quality benchmark dataset via expert re-annotation. Evaluated on our benchmark, our method achieves 45.49% higher top-1 accuracy and 31.97% higher mean reciprocal rank (MRR) than state-of-the-art approaches, demonstrating substantial improvements in assignment effectiveness.
📝 Abstract
Issue assignment plays a critical role in open-source software (OSS) maintenance, which involves recommending the most suitable developers to address the reported issues. Given the high volume of issue reports in large-scale projects, manually assigning issues is tedious and costly. Previous studies have proposed automated issue assignment approaches that primarily focus on modeling issue report textual information, developers' expertise, or interactions between issues and developers based on historical issue-fixing records. However, these approaches often suffer from performance limitations due to the presence of incorrect and missing labels in OSS datasets, as well as the long tail of developer contributions and the changes of developer activity as the project evolves. To address these challenges, we propose IssueCourier, a novel Multi-Relational Heterogeneous Temporal Graph Neural Network approach for issue assignment. Specifically, we formalize five key relationships among issues, developers, and source code files to construct a heterogeneous graph. Then, we further adopt a temporal slicing technique that partitions the graph into a sequence of time-based subgraphs to learn stage-specific patterns. Furthermore, we provide a benchmark dataset with relabeled ground truth to address the problem of incorrect and missing labels in existing OSS datasets. Finally, to evaluate the performance of IssueCourier, we conduct extensive experiments on our benchmark dataset. The results show that IssueCourier can improve over the best baseline up to 45.49% in top-1 and 31.97% in MRR.