Building babyGPTs: Youth Engaging in Data Practices and Ethical Considerations through the Construction of Generative Language Models

📅 2025-04-20

📈 Citations: 1

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Adolescents are typically passive users in AI education, with limited opportunities to engage meaningfully in generative AI development. Method: This study investigates the feasibility of involving 14–15-year-olds as co-designers of a generative language model (GLM), specifically babyGPT, through a structured, semester-long project centered on script generation. Activities included localized corpus construction, data annotation, lightweight LLM fine-tuning using Hugging Face Transformers, collaborative prompt engineering, and structured ethics reflection workshops. Contribution/Results: For the first time, empirical evidence demonstrates that adolescents can substantively undertake end-to-end GLM development tasks: all three participants successfully built functional script generators and exhibited robust data literacy, foundational model understanding, and nuanced ethical reasoning. The work extends the “children as technology co-creators” paradigm into the large language model era and provides a replicable pedagogical framework for AI literacy education and responsible innovation.

Technology Category

Application Category

📝 Abstract

As generative language models (GLMs) have gained popularity, youth are increasingly using them in their everyday lives. As such, most research has centered on supporting youth as users of GLM-powered systems. However, we know little of how to engage youth in the design of these models. Building on the rich legacy of child-computer interaction research that positions youth as designers of computing systems, we explore how to support young people in designing GLMs. Through a case study of three teenagers (ages 14-15) building a babyGPT screenplay generator, we illustrate how the team developed a model while engaging in artificial intelligence/machine learning-relevant data practices and addressing ethical issues. This paper contributes a case study that demonstrates the feasibility of engaging youth in building GLMs.

Problem

Research questions and friction points this paper is trying to address.

Engaging youth in designing generative language models

Supporting teenagers in AI/ML data practices

Addressing ethical issues in youth-built GLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Youth design generative language models

Teenagers build babyGPT screenplay generator

Engage youth in AI data practices

🔎 Similar Papers

No similar papers found.