Gender Disparities in StackOverflow's Community-Based Question Answering: A Matter of Quantity versus Quality

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study investigates the root causes of the gender-based reputation gap on Stack Overflow: whether it stems from systematic biases in answer quality or differences in user activity levels. To address this, we present the first integrated analysis combining human evaluation with large language model–based automated scoring to assess both answer quality and the selection mechanism of “best answers” across genders. Our findings reveal no significant gender differences in answer quality, nor evidence of gender bias in the selection of best answers. Instead, the reputation disparity is primarily driven by participation frequency—specifically, the volume of questions asked and answers provided. By innovatively synthesizing multi-dimensional evaluation approaches, this work demonstrates that gender inequality on the platform arises from behavioral engagement patterns rather than disparities in content quality.

Technology Category

Application Category

📝 Abstract

Community Question-Answering platforms, such as Stack Overflow (SO), are valuable knowledge exchange and problem-solving resources. These platforms incorporate mechanisms to assess the quality of answers and participants'expertise, ideally free from discriminatory biases. However, prior research has highlighted persistent gender biases, raising concerns about the inclusivity and fairness of these systems. Addressing such biases is crucial for fostering equitable online communities. While previous studies focus on detecting gender bias by comparing male and female user characteristics, they often overlook the interaction between genders, inherent answer quality, and the selection of ``best answers''by question askers. In this study, we investigate whether answer quality is influenced by gender using a combination of human evaluations and automated assessments powered by Large Language Models. Our findings reveal no significant gender differences in answer quality, nor any substantial influence of gender bias on the selection of ``best answers."Instead, we find that the significant gender disparities in SO's reputation scores are primarily attributable to differences in users'activity levels, e.g., the number of questions and answers they write. Our results have important implications for the design of scoring systems in community question-answering platforms. In particular, reputation systems that heavily emphasize activity volume risk amplifying gender disparities that do not reflect actual differences in answer quality, calling for more equitable design strategies.

Problem

Research questions and friction points this paper is trying to address.

gender bias

community question-answering

answer quality

reputation system

Stack Overflow

Innovation

Methods, ideas, or system contributions that make the work stand out.

gender bias

answer quality

large language models