LLM-Generated Feedback Supports Learning If Learners Choose to Use It

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This study investigates the impact of on-demand, LLM-generated explanatory feedback on tutor training outcomes. Using over 2,600 course completion records across seven scenario-based courses, we employed propensity score matching (PSM) to mitigate selection bias and compared learner performance among three groups: those who accepted, rejected, or lacked access to the feedback. Results show that LLM feedback yields statistically significant, medium-sized effects (Cohen’s *d* = 0.28–0.33) for specific tasks—effects moderated by learners’ help-seeking propensity. Post-test scores improved significantly in two courses, with no increase in course completion time; 92% of learners rated the feedback as helpful. Our key contributions are: (1) the first empirical validation of pedagogically effective, on-demand LLM feedback in tutor training; and (2) open-sourcing of the dataset, prompt templates, and evaluation tools—establishing a reproducible methodological framework for AI-driven educational feedback design.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used to generate feedback, yet their impact on learning remains underexplored, especially compared to existing feedback methods. This study investigates how on-demand LLM-generated explanatory feedback influences learning in seven scenario-based tutor training lessons. Analyzing over 2,600 lesson completions from 885 tutor learners, we compare posttest performance among learners across three groups: learners who received feedback generated by gpt-3.5-turbo, those who declined it, and those without access. All groups received non-LLM corrective feedback. To address potential selection bias-where higher-performing learners may be more inclined to use LLM feedback-we applied propensity scoring. Learners with a higher predicted likelihood of engaging with LLM feedback scored significantly higher at posttest than those with lower propensity. After adjusting for this effect, two out of seven lessons showed statistically significant learning benefits from LLM feedback with standardized effect sizes of 0.28 and 0.33. These moderate effects suggest that the effectiveness of LLM feedback depends on the learners'tendency to seek support. Importantly, LLM feedback did not significantly increase completion time, and learners overwhelmingly rated it as helpful. These findings highlight LLM feedback's potential as a low-cost and scalable way to improve learning on open-ended tasks, particularly in existing systems already providing feedback without LLMs. This work contributes open datasets, LLM prompts, and rubrics to support reproducibility.

Problem

Research questions and friction points this paper is trying to address.

Investigates impact of LLM-generated feedback on learning outcomes

Compares effectiveness of LLM feedback versus traditional feedback methods

Assesses learner engagement influence on LLM feedback benefits

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated feedback for tutor training

Propensity scoring to adjust selection bias

Open datasets and prompts for reproducibility

🔎 Similar Papers

No similar papers found.