🤖 AI Summary
This study investigates whether large language models (LLMs), specifically ChatGPT, exhibit gender or racial bias in the automated coding of collaborative communication data. Method: Drawing on three canonical collaborative tasks—negotiation, problem solving, and decision making—we develop a human-annotated, rule-based multimodal coding framework and conduct the first systematic fairness evaluation of ChatGPT’s coding performance across multiple collaborative tasks. Contribution/Results: Experimental results show no statistically significant bias along gender or racial dimensions (p > 0.05); ChatGPT’s coding accuracy is not significantly different from human annotators (Δ < 2.1%, p = 0.12). This work fills a critical empirical gap in fairness research on LLMs for collaborative assessment and demonstrates ChatGPT’s reliability and scalability for high-throughput, large-scale measurement of collaborative competence.
📝 Abstract
Assessing communication and collaboration at scale depends on a labor intensive task of coding communication data into categories according to different frameworks. Prior research has established that ChatGPT can be directly instructed with coding rubrics to code the communication data and achieves accuracy comparable to human raters. However, whether the coding from ChatGPT or similar AI technology exhibits bias against different demographic groups, such as gender and race, remains unclear. To fill this gap, this paper investigates ChatGPT-based automated coding of communication data using a typical coding framework for collaborative problem solving, examining differences across gender and racial groups. The analysis draws on data from three types of collaborative tasks: negotiation, problem solving, and decision making. Our results show that ChatGPT-based coding exhibits no significant bias across gender and racial groups, paving the road for its adoption in large-scale assessment of collaboration and communication.