Identifying Concurrency Bug Reports via Linguistic Patterns

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study addresses the challenges of high annotation costs and error-proneness in the automatic identification of concurrency bug reports. The authors propose a multi-granularity linguistic pattern–based classification framework that constructs 58 domain-specific language patterns across four levels: word, phrase, sentence, and report. The approach integrates pattern matching, traditional machine learning, fine-tuning of pre-trained language models (PLMs), and prompting of large language models (LLMs). Notably, domain-specific linguistic knowledge is innovatively injected into the PLM fine-tuning process. The work also releases a high-quality annotated dataset. Experimental results demonstrate that the method achieves precision rates of 91% and 93% on GitHub and Jira datasets, respectively, and maintains a high precision of 91% on hold-out test data.

Technology Category

Application Category

📝 Abstract

With the growing ubiquity of multi-core architectures, concurrent systems have become essential but increasingly prone to complex issues such as data races and deadlocks. While modern issue-tracking systems facilitate the reporting of such problems, labeling concurrency-related bug reports remains a labor-intensive and error-prone task. This paper presents a linguistic-pattern-based framework for automatically identifying concurrency bug reports. We derive 58 distinct linguistic patterns from 730 manually labeled concurrency bug reports, organized across four levels: word-level (keywords), phrase-level (n-grams), sentence-level (semantic), and bug report-level (contextual). To assess their effectiveness, we evaluate four complementary approaches-matching, learning, prompt-based, and fine-tuning-spanning traditional machine learning, large language models (LLMs), and pre-trained language models (PLMs). Our comprehensive evaluation on 12 large-scale open-source projects (10,920 issue reports from GitHub and Jira) demonstrates that fine-tuning PLMs with linguistic-pattern-enriched inputs achieves the best performance, reaching a precision of 91% on GitHub and 93% on Jira, and maintaining strong precision on post cut-off data (91%). The contributions of this work include: (1) a comprehensive taxonomy of linguistic patterns for concurrency bugs, (2) a novel fine-tuning strategy that integrates domain-specific linguistic knowledge into PLMs, and (3) a curated, labeled dataset to support reproducible research. Together, these advances provide a foundation for improving the automation, precision, and interpretability of concurrency bug classification.

Problem

Research questions and friction points this paper is trying to address.

concurrency bug

bug report identification

linguistic patterns

issue tracking

automated classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

linguistic patterns

concurrency bug detection

pre-trained language models

fine-tuning strategy

automated bug classification

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow