🤖 AI Summary
This study investigates how the proliferation of large language models (LLMs) may alter students’ programming behaviors and impact learning outcomes and assessment validity. Leveraging real-world code submissions from a graduate-level cloud computing course over five years, we employ a quasi-longitudinal design to compare student behavior across five semesters before and after the release of ChatGPT. Using quantitative metrics—including code edit distance, submission length, and grade improvement—combined with statistical analysis, we provide the first empirical evidence linking LLM adoption to shifts in programming behavior and learning efficacy. Results indicate that post-ChatGPT, students submitted longer final code with greater modifications yet exhibited smaller performance gains; these behavioral changes were significantly correlated with overall academic outcomes, suggesting that overreliance on LLMs may impair learning and warrants critical reevaluation of current educational assessment practices.
📝 Abstract
The widespread availability of large language models (LLMs) has changed how students engage with coding and problem-solving. While these tools may increase student productivity, they also make it more difficult for instructors to assess students'learning and effort. In this quasi-longitudinal study, we analyze five years of student source code submissions in a graduate-level cloud computing course, focusing on an assignment that remained unchanged and examining students'behavior during the period spanning five semesters before the release of ChatGPT and five semesters after. Student coding behavior has changed significantly since Fall 2022. The length of their final submissions increased. Between consecutive submissions, average edit distances increased while average score improvement decreased, suggesting that both student productivity and learning have decreased after ChatGPT's release. Additionally, there are statistically significant correlations between these behavioral changes and their overall performance. Although we cannot definitively attribute them to LLM misuse, they are consistent with our hypothesis that some students are over-reliant on LLMs, which is negatively affecting their learning outcomes. Our findings raise an alarm around the first generation of graduates in the age of LLMs, calling upon both educators and employers to reflect on their evaluation methods for genuine expertise and productivity.