🤖 AI Summary
Toxic discourse in GitHub’s open-source community severely undermines collaborative ecosystems, necessitating systematic causal analysis. Method: Leveraging a stratified sample of 2,828 projects, we integrate the domain-specific ToxiCR detector with 600 manually validated annotations, employing a mixed-methods approach—combining automated labeling, qualitative thematic analysis, and statistical modeling. Contribution/Results: We first reveal that gaming-related projects exhibit toxicity intensity seven times higher than non-gaming ones; toxic behavior is highly recurrent and targeted; abusive language is the predominant type (highest prevalence), and contributors frequently assume dual roles as both perpetrators and victims. Key findings indicate that corporate sponsorship significantly reduces toxicity, while high issue-resolution rates effectively suppress it. We identify actionable predictors—including project popularity, sponsorship status, and governance efficacy—that inform evidence-based platform interventions and community governance strategies.
📝 Abstract
Toxicity on GitHub can severely impact Open Source Software (OSS) development communities. To mitigate such behavior, a better understanding of its nature and how various measurable characteristics of project contexts and participants are associated with its prevalence is necessary. To achieve this goal, we conducted a large-scale mixed-method empirical study of 2,828 GitHub-based OSS projects randomly selected based on a stratified sampling strategy. Using ToxiCR, an SE domain-specific toxicity detector, we automatically classified each comment as toxic or non-toxic. Additionally, we manually analyzed a random sample of 600 comments to validate ToxiCR's performance and gain insights into the nature of toxicity within our dataset. The results of our study suggest that profanity is the most frequent toxicity on GitHub, followed by trolling and insults. While a project's popularity is positively associated with the prevalence of toxicity, its issue resolution rate has the opposite association. Corporate-sponsored projects are less toxic, but gaming projects are seven times more toxic than non-gaming ones. OSS contributors who have authored toxic comments in the past are significantly more likely to repeat such behavior. Moreover, such individuals are more likely to become targets of toxic texts.