π€ AI Summary
To address the limited comprehension of three-dimensional scientific concepts by large language models (LLMs), which impairs accuracy in automated short-answer grading in science education, this paper proposes an adaptive retrieval-augmented generation (RAG) framework. The method introduces a dynamic retrieval mechanism that jointly models question prompts and student responses, integrating semantic vector search with a curated educational knowledge base to enable task-aware, fine-grained knowledge injection. Additionally, it synergistically combines prompt engineering with lightweight domain-specific fine-tuning to enhance pedagogical adaptability. Experiments across multiple science education benchmark datasets demonstrate consistent improvements in short-answer scoring accuracy, outperforming standard LLM baselines by 5.2β8.7 percentage points. These results validate the frameworkβs effectiveness and practicality as a high-accuracy, interpretable AI teaching assistant for science education.
π Abstract
Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains.