Building babyGPTs: Youth Engaging in Data Practices and Ethical Considerations through the Construction of Generative Language Models

๐Ÿ“… 2025-04-20
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Adolescents are typically passive users in AI education, with limited opportunities to engage meaningfully in generative AI development. Method: This study investigates the feasibility of involving 14โ€“15-year-olds as co-designers of a generative language model (GLM), specifically babyGPT, through a structured, semester-long project centered on script generation. Activities included localized corpus construction, data annotation, lightweight LLM fine-tuning using Hugging Face Transformers, collaborative prompt engineering, and structured ethics reflection workshops. Contribution/Results: For the first time, empirical evidence demonstrates that adolescents can substantively undertake end-to-end GLM development tasks: all three participants successfully built functional script generators and exhibited robust data literacy, foundational model understanding, and nuanced ethical reasoning. The work extends the โ€œchildren as technology co-creatorsโ€ paradigm into the large language model era and provides a replicable pedagogical framework for AI literacy education and responsible innovation.

Technology Category

Application Category

๐Ÿ“ Abstract
As generative language models (GLMs) have gained popularity, youth are increasingly using them in their everyday lives. As such, most research has centered on supporting youth as users of GLM-powered systems. However, we know little of how to engage youth in the design of these models. Building on the rich legacy of child-computer interaction research that positions youth as designers of computing systems, we explore how to support young people in designing GLMs. Through a case study of three teenagers (ages 14-15) building a babyGPT screenplay generator, we illustrate how the team developed a model while engaging in artificial intelligence/machine learning-relevant data practices and addressing ethical issues. This paper contributes a case study that demonstrates the feasibility of engaging youth in building GLMs.
Problem

Research questions and friction points this paper is trying to address.

Engaging youth in designing generative language models
Supporting teenagers in AI/ML data practices
Addressing ethical issues in youth-built GLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Youth design generative language models
Teenagers build babyGPT screenplay generator
Engage youth in AI data practices
๐Ÿ”Ž Similar Papers
No similar papers found.