🤖 AI Summary
This study addresses implicit gender stereotypes embedded in Scratch programming introductory tutorials. Method: Inspired by the software engineering concept of “code smells,” we propose the novel notion of “gender stereotype smells” and develop an operational automated assessment framework. The framework models bias across four dimensions—character representation, narrative structure, task types, and programming concepts—and integrates large language models (LLMs), rule-based engines, and educational content analysis to detect and mitigate stereotypical patterns. Contribution/Results: Empirical evaluation on 73 widely used tutorials reveals pervasive gender bias. While off-the-shelf LLMs exhibit limited detection accuracy, our framework reliably identifies stereotypes and guides the generation of more inclusive instructional content. To our knowledge, this is the first quantifiable, intervention-ready assessment and mitigation framework for gender bias specifically designed for programming启蒙 education.
📝 Abstract
Gender stereotypes in introductory programming courses often go unnoticed, yet they can negatively influence young learners' interest and learning, particularly under-represented groups such as girls. Popular tutorials on block-based programming with Scratch may unintentionally reinforce biases through character choices, narrative framing, or activity types. Educators currently lack support in identifying and addressing such bias. With large language models~(LLMs) increasingly used to generate teaching materials, this problem is potentially exacerbated by LLMs trained on biased datasets. However, LLMs also offer an opportunity to address this issue. In this paper, we explore the use of LLMs for automatically identifying gender-stereotypical elements in Scratch tutorials, thus offering feedback on how to improve teaching content. We develop a framework for assessing gender bias considering characters, content, instructions, and programming concepts. Analogous to how code analysis tools provide feedback on code in terms of code smells, we operationalise this framework using an automated tool chain that identifies *gender stereotype smells*. Evaluation on 73 popular Scratch tutorials from leading educational platforms demonstrates that stereotype smells are common in practice. LLMs are not effective at detecting them, but our gender bias evaluation framework can guide LLMs in generating tutorials with fewer stereotype smells.