HaPy-Bug -- Human Annotated Python Bug Resolution Dataset

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of fine-grained, high-quality annotated data for modeling Python bug-fixing processes. We introduce HaPy-Bug—the first Python bug-fix dataset supporting multi-expert collaboration and line-level human annotation—comprising 793 real-world fix commits. Each modified line is independently labeled by three domain experts, covering file functionality, line-level change type, and reviewer confidence. HaPy-Bug enables, for the first time in Python, systematic identification and quantitative analysis of tangled changes, facilitating repair pattern mining and statistical modeling. Experimental analysis reveals empirical regularities in file functional distribution, prevalent repair patterns, and change coupling characteristics across Python projects. The dataset and methodology provide a reproducible benchmark for defect prediction, automated repair recommendation, and intelligent repository analytics.

Technology Category

Application Category

📝 Abstract
We present HaPy-Bug, a curated dataset of 793 Python source code commits associated with bug fixes, with each line of code annotated by three domain experts. The annotations offer insights into the purpose of modified files, changes at the line level, and reviewers' confidence levels. We analyze HaPy-Bug to examine the distribution of file purposes, types of modifications, and tangled changes. Additionally, we explore its potential applications in bug tracking, the analysis of bug-fixing practices, and the development of repository analysis tools. HaPy-Bug serves as a valuable resource for advancing research in software maintenance and security.
Problem

Research questions and friction points this paper is trying to address.

Analyzing annotated Python bug fixes for patterns
Exploring bug-fixing practices and modification types
Enhancing software maintenance and security research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated dataset of Python bug-fix commits
Line-level annotations by domain experts
Analyzes file purposes and modification types
🔎 Similar Papers
No similar papers found.
Piotr Przymus
Piotr Przymus
Nicolaus Copernicus University in Toruń
software engineeringdata miningmachine learning
M
Mikolaj Fejzer
Nicolaus Copernicus University
J
Jakub Narkebski
Nicolaus Copernicus University
R
Radoslaw Wo'zniak
Nicolaus Copernicus University
L
Lukasz Halada
University of Wrocław
A
Aleksander Kazecki
Nicolaus Copernicus University
M
Mykhailo Molchanov
Kyiv Polytechnic Institute
Krzysztof Stencel
Krzysztof Stencel
Professor of Computer Science, University of Warsaw
DatabasesSoftware EngineeringFormal Methods