The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances

📅 2024-04-25

🏛️ arXiv.org

📈 Citations: 22

✨ Influential: 3

career value

180K/year

🤖 AI Summary

This study investigates the impact of a general-purpose large language model (GPT-4) on novice learning outcomes and engagement in large-scale online programming education. Employing a multinational, stratified randomized controlled trial across 146 countries with 5,831 students—complemented by platform log analysis and multilevel regression modeling—we identify a dual heterogeneous effect of LLM access: aggregate course engagement significantly declines, yet adopters’ final exam performance improves markedly. Crucially, this effect is moderated by national Human Development Index (HDI): students in low-HDI countries exhibit increased engagement. Our findings reveal a compensatory role for LLMs in educational equity—a novel insight supported by the first large-scale empirical evidence on AI-enhanced global programming education. We further propose a differentiated deployment framework grounded in cross-national developmental disparities, advancing both theoretical understanding and practical implementation of AI in inclusive computing education.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are quickly being adopted in a wide range of learning experiences, especially via ubiquitous and broadly accessible chat interfaces like ChatGPT and Copilot. This type of interface is readily available to students and teachers around the world, yet relatively little research has been done to assess the impact of such generic tools on student learning. Coding education is an interesting test case, both because LLMs have strong performance on coding tasks, and because LLM-powered support tools are rapidly becoming part of the workflow of professional software engineers. To help understand the impact of generic LLM use on coding education, we conducted a large-scale randomized control trial with 5,831 students from 146 countries in an online coding class in which we provided some students with access to a chat interface with GPT-4. We estimate positive benefits on exam performance for adopters, the students who used the tool, but over all students, the advertisement of GPT-4 led to a significant average decrease in exam participation. We observe similar decreases in other forms of course engagement. However, this decrease is modulated by the student's country of origin. Offering access to LLMs to students from low human development index countries increased their exam participation rate on average. Our results suggest there may be promising benefits to using LLMs in an introductory coding class, but also potential harms for engagement, which makes their longer term impact on student success unclear. Our work highlights the need for additional investigations to help understand the potential impact of future adoption and integration of LLMs into classrooms.

Problem

Research questions and friction points this paper is trying to address.

Assessing GPT-4 impact on coding class engagement and performance

Evaluating LLM effects on students from diverse development backgrounds

Exploring long-term benefits and harms of LLMs in education

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used GPT-4 chat in coding class

Conducted large-scale randomized trial

Assessed impact on exam performance

🔎 Similar Papers

AI Meets the Classroom: When Do Large Language Models Harm Learning?