Beyond Binary Moderation: Identifying Fine-Grained Sexist and Misogynistic Behavior on GitHub with Large Language Models

📅 2025-07-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing GitHub gender bias detection tools predominantly rely on keyword matching or binary classification, failing to capture implicit, context-dependent biases and thereby exacerbating attrition among underrepresented developers. This paper introduces the first fine-grained, multi-class framework tailored for technical communities, systematically defining and annotating 12 distinct categories of gendered discourse—moving beyond conventional binary paradigms. Leveraging instruction-tuned large language models (specifically GPT-4o), we employ iterative prompt engineering coupled with multi-round expert validation to enhance model sensitivity to subtle adversarial language and improve output interpretability. Under rigorous evaluation (F1 score and Matthews Correlation Coefficient), our optimal configuration (GPT-4o + Prompt 19) achieves an MCC of 0.501, with low false-positive rates and statistically significant improvements over all baselines. The framework provides a practical, deployable pathway toward fostering inclusive open-source ecosystems.

Technology Category

Application Category

📝 Abstract

Background: Sexist and misogynistic behavior significantly hinders inclusion in technical communities like GitHub, causing developers, especially minorities, to leave due to subtle biases and microaggressions. Current moderation tools primarily rely on keyword filtering or binary classifiers, limiting their ability to detect nuanced harm effectively. Aims: This study introduces a fine-grained, multi-class classification framework that leverages instruction-tuned Large Language Models (LLMs) to identify twelve distinct categories of sexist and misogynistic comments on GitHub. Method: We utilized an instruction-tuned LLM-based framework with systematic prompt refinement across 20 iterations, evaluated on 1,440 labeled GitHub comments across twelve sexism/misogyny categories. Model performances were rigorously compared using precision, recall, F1-score, and the Matthews Correlation Coefficient (MCC). Results: Our optimized approach (GPT-4o with Prompt 19) achieved an MCC of 0.501, significantly outperforming baseline approaches. While this model had low false positives, it struggled to interpret nuanced, context-dependent sexism and misogyny reliably. Conclusion: Well-designed prompts with clear definitions and structured outputs significantly improve the accuracy and interpretability of sexism detection, enabling precise and practical moderation on developer platforms like GitHub.

Problem

Research questions and friction points this paper is trying to address.

Detect nuanced sexist behavior beyond binary classification on GitHub

Classify twelve categories of sexist comments using LLMs

Improve moderation accuracy with structured prompts and clear definitions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses instruction-tuned LLMs for fine-grained classification

Implements systematic prompt refinement over 20 iterations

Leverages structured outputs for improved detection accuracy

🔎 Similar Papers

No similar papers found.