Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

πŸ“… 2025-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited comprehension of three-dimensional scientific concepts by large language models (LLMs), which impairs accuracy in automated short-answer grading in science education, this paper proposes an adaptive retrieval-augmented generation (RAG) framework. The method introduces a dynamic retrieval mechanism that jointly models question prompts and student responses, integrating semantic vector search with a curated educational knowledge base to enable task-aware, fine-grained knowledge injection. Additionally, it synergistically combines prompt engineering with lightweight domain-specific fine-tuning to enhance pedagogical adaptability. Experiments across multiple science education benchmark datasets demonstrate consistent improvements in short-answer scoring accuracy, outperforming standard LLM baselines by 5.2–8.7 percentage points. These results validate the framework’s effectiveness and practicality as a high-accuracy, interpretable AI teaching assistant for science education.

Technology Category

Application Category

πŸ“ Abstract
Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains.
Problem

Research questions and friction points this paper is trying to address.

Improving short answer grading accuracy using LLMs
Addressing LLMs' domain knowledge limitations in education
Enhancing grading with dynamic retrieval of educational content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive RAG framework for dynamic knowledge retrieval
Semantic search with curated educational sources
RAG-enhanced grading improves accuracy over baselines