🤖 AI Summary
This study addresses the limited systematic understanding of hackathons as software engineering activities, hindered by a lack of large-scale empirical data. To bridge this gap, the authors introduce HackRep, a novel dataset comprising 100,356 GitHub hackathon projects, offering the first reproducible, large-scale empirical foundation for hackathon research. The dataset is constructed using repository metadata, commit timestamps, and contributor information, enabling comprehensive analyses of key dimensions such as project sustainability, team composition, and geographic distribution. Furthermore, the work validates the feasibility of estimating hackathon duration from commit activity logs. By providing this extensive and structured resource, the study significantly advances the capacity of the software engineering community to investigate hackathon phenomena empirically and at scale.
📝 Abstract
Hackathons are time-bound collaborative events that often target software creation. Although hackathons have been studied in the past, existing work focused on in-depth case studies limiting our understanding of hackathons as a software engineering activity. To complement the existing body of knowledge, we introduce HackRep, a dataset of 100,356 hackathon GitHub repositories. We illustrate the ways HackRep can benefit software engineering researchers by presenting a preliminary investigation of hackathon project continuation, hackathon team composition, and an estimation of hackathon geography. We further display the opportunities of using this dataset, for instance showing the possibility of estimating hackathon durations based on commit timestamps.