🤖 AI Summary
Current graphical programming education relies predominantly on code completion metrics to assess student comprehension, which fails to capture deeper program understanding. To address this limitation, we propose the first automated method for generating comprehension questions tailored to Scratch programs. Leveraging the static analysis tool LitterBox, our approach integrates structural and semantic parsing of Scratch code to instantiate 30 question templates grounded in a comprehensive program comprehension model—enabling scalable, fine-grained question generation. Applied to 600,000 real-world Scratch projects, it produced over 54 million questions. Empirical evaluation confirms high semantic validity and discriminative power of the generated questions; moreover, student performance on these questions correlates significantly with their course grades (p < 0.01). This work pioneers program comprehension question generation in the domain of block-based programming and establishes a scalable, interpretable paradigm for automated formative assessment.
📝 Abstract
When learning to program, students are usually assessed based on the code they wrote. However, the mere completion of a programming task does not guarantee actual comprehension of the underlying concepts. Asking learners questions about the code they wrote has therefore been proposed as a means to assess program comprehension. As creating targeted questions for individual student programs can be tedious and challenging, prior work has proposed to generate such questions automatically. In this paper we generalize this idea to the block-based programming language Scratch. We propose a set of 30 different questions for Scratch code covering an established program comprehension model, and extend the LitterBox static analysis tool to automatically generate corresponding questions for a given Scratch program. On a dataset of 600,913 projects we generated 54,118,694 questions automatically. Our initial experiments with 34 ninth graders demonstrate that this approach can indeed generate meaningful questions for Scratch programs, and we find that the ability of students to answer these questions on their programs relates to their overall performance.