🤖 AI Summary
This study investigates the effectiveness of generative AI in automatically producing instructional slides and examines discrepancies between instructor and student perceptions. Through educator-led narrative evaluations and real classroom experiments, the performance of tools including NotebookLM, Claude, M365 Copilot, and Cursor is systematically compared across dimensions of factual accuracy, content completeness, and pedagogical appropriateness. Student surveys further assess their ability to discern slide quality and correctly attribute authorship to human or AI sources. Findings reveal that specialized code-assistant tools yield the highest-quality slides. Moreover, students generally struggle to reliably distinguish AI-generated from human-created content and exhibit a cognitive bias—tending to misattribute low-quality slides to AI while under-recognizing high-quality slides as AI-generated. This work offers the first systematic comparison of diverse generative AI systems in authentic educational settings, uncovering a significant misalignment between user perceptions and technological realities.
📝 Abstract
As generative AI (GenAI) tools become easily accessible, there is promise in using such tools to support instructors. To that end, this paper examines using GenAI to help generate slides from instructor authored course notes, emphasizing instructor and student perceptions. We examine an end-to-end education tool (NotebookLM), two general-purpose LLMs (Claude, M365 Copilot), and two coding assistants (Cursor, Claude Code). We first analyze whether GenAI generated slides are ``good'' via narrative assessment by educators. We choose the best slides to use (with some modification) in a real course setting, and compare the student perception of human vs. AI generated slides. We find that coding assistant tools produce slides that were most accurate, complete, and pedagogically sound. Additionally, students rate GenAI slides to be of similar quality as instructor-created slides, and cannot reliably identify which slides are AI-generated. Additionally, we find a negative correlation between a high quality rating and a high ``AI-generated'' rating, suggesting students associate poor quality with the source of the slides being AI. These findings highlight promising opportunities for integrating GenAI into instructional design workflows and call for further research on how educators can best harness such tools responsibly and effectively.