Applying Large Language Models to Issue Classification: Revisiting with Extended Data and New Models

📅 2025-05-01

🏛️ Science of Computer Programming

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Automated classification of issue reports in open-source projects remains challenging due to inefficiency of manual labeling and poor generalizability of existing automated approaches, which rely heavily on large-scale annotated data. Method: This paper proposes a lightweight adaptation framework for large language models (LLMs) to enable zero-shot and few-shot cross-project issue classification. It systematically evaluates the generalization capabilities of multiple generations of both open-source (LLaMA-3, Qwen2) and closed-source (Claude-3, GPT-4) LLMs, and introduces instruction tuning with context enhancement—integrating dynamic in-context example retrieval and structured output constraints. Contribution/Results: The approach achieves an 89.7% F1 score on cross-project classification, outperforming BERT by 12.3 percentage points; long-tail category accuracy improves by 27.6%, and domain adaptation cost is significantly reduced.