QualiTagger: Automating software quality detection in issue trackers

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

To address the challenges of automatically identifying software quality degradation and delayed technical debt warnings in open-source project issues, this paper proposes a fine-grained quality attribute recognition method based on natural language processing. We first construct a large-scale, real-world GitHub issue corpus annotated with quality labels—integrating expert manual annotation with data mining techniques. Next, we design a Transformer-based sequence labeling model (BERT/RoBERTa) tailored for cross-project generalization to identify semantic fragments associated with quality attributes such as security and maintainability. Evaluated across multiple projects, our approach achieves an F1-score of 0.89; improves inter-annotator agreement among students by 42%; and detects latent security issues in industrial settings with 93% accuracy—substantially outperforming rule-based and shallow machine learning baselines. The method enables dynamic technical debt预警 and supports longitudinal quality evolution analysis.

Technology Category

Application Category

📝 Abstract

A systems quality is a major concern for development teams when it evolve. Understanding the effects of a loss of quality in the codebase is crucial to avoid side effects like the appearance of technical debt. Although the identification of these qualities in software requirements described in natural language has been investigated, most of the results are often not applicable in practice, and rely on having been validated on small datasets and limited amount of projects. For many years, machine learning (ML) techniques have been proved as a valid technique to identify and tag terms described in natural language. In order to advance previous works, in this research we use cutting edge models like Transformers, together with a vast dataset mined and curated from GitHub, to identify what text is usually associated with different quality properties. We also study the distribution of such qualities in issue trackers from openly accessible software repositories, and we evaluate our approach both with students from a software engineering course and with its application to recognize security labels in industry.

Problem

Research questions and friction points this paper is trying to address.

Automate detection of software quality in issue trackers

Identify text linked to quality properties using Transformers

Study quality distribution in open-source repository issue trackers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Transformers for quality detection

Leverages large GitHub dataset

Evaluated in academia and industry

🔎 Similar Papers

No similar papers found.