🤖 AI Summary
This study addresses the ongoing educational debate over whether AI can effectively replace human partners by systematically comparing the impacts of GitHub Copilot and human pair programming on novice learners’ programming performance, knowledge retention, cognitive load, and emotional experience. In a controlled experiment, 22 participants completed a 20-minute Python programming task either with Copilot or a human partner, followed by an independent retention test one week later. Results indicate that while the Copilot group achieved superior immediate performance and reported lower cognitive load, they exhibited poorer emotional experiences and significantly greater declines in retention test scores, suggesting potential detrimental effects on long-term learning. This work introduces the first multidimensional evaluation framework integrating affective and longitudinal learning outcomes, revealing the dual-edged nature of AI-assisted programming.
📝 Abstract
Code-generating Artificial Intelligence has gained popularity within both professional and educational programming settings over the past several years. While research and pedagogy are beginning to cope with this change, computing students are left to bear the unforeseen consequences of AI amidst a dearth of empirical evidence about its effects. Though pair programming between students is well studied and known to be beneficial to self-efficacy and academic achievement, it remains underutilized and further threatened by the proposition that AI can replace a human programming partner. In this paper, we present a controlled pair programming study with 22 participants who wrote Python code under time pressure in teams of two and individually with GitHub Copilot for 20 minutes each. They were incentivized by bonus compensation to balance performance with understanding and were retested individually on the programming tasks after a retention interval of one week. Subjective measures of workload and emotion as well as objective measures of performance and learning (retest performance) were collected. Results showed that participants performed significantly better with GitHub Copilot than their human teammate, and several dimensions of their workload were significantly reduced. However, the emotional effect of the human teammate was significantly more positive and arousing as compared to working with Copilot. Furthermore, there was a nonsignificant absolute retest performance reduction in the AI condition and a larger retest performance decrement in the AI condition. We recommend that educators strongly consider revisiting pair programming as an educational tool in addition to embracing modern AI.