🤖 AI Summary
In virtual laboratory instruction, educators face challenges in efficiently generating assessment questions aligned with learning objectives, experimental content, and cognitive complexity. This paper proposes a four-component alignment framework—comprising learning objective interpretation, experimental content analysis, question taxonomy, and fine-grained prompt control—to systematically align instructor intent, simulation context, and Bloom’s taxonomy levels. Leveraging knowledge-unit analysis, relational modeling, and TELeR-based prompt engineering, the method integrates large language models (LLMs) to enable natural-language-driven, structured question generation. Evaluated across 19 open-source LLMs, it produced over 1,100 questions with an average parseability rate of 80%, format compliance exceeding 90%, and a 0.8-point improvement in Likert-scale expert ratings versus baselines. The approach balances pedagogical appropriateness, customization, and scalability, establishing a novel paradigm for intelligent educational assessment.
📝 Abstract
Virtual Labs offer valuable opportunities for hands-on, inquiry-based science learning, yet teachers often struggle to adapt them to fit their instructional goals. Third-party materials may not align with classroom needs, and developing custom resources can be time-consuming and difficult to scale. Recent advances in Large Language Models (LLMs) offer a promising avenue for addressing these limitations. In this paper, we introduce a novel alignment framework for instructional goal-aligned question generation, enabling teachers to leverage LLMs to produce simulation-aligned, pedagogically meaningful questions through natural language interaction. The framework integrates four components: instructional goal understanding via teacher-LLM dialogue, lab understanding via knowledge unit and relationship analysis, a question taxonomy for structuring cognitive and pedagogical intent, and the TELeR taxonomy for controlling prompt detail. Early design choices were informed by a small teacher-assisted case study, while our final evaluation analyzed over 1,100 questions from 19 open-source LLMs. With goal and lab understanding grounding questions in teacher intent and simulation context, the question taxonomy elevates cognitive demand (open-ended formats and relational types raise quality by 0.29-0.39 points), and optimized TELeR prompts enhance format adherence (80% parsability, >90% adherence). Larger models yield the strongest gains: parsability +37.1%, adherence +25.7%, and average quality +0.8 Likert points.