AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual scoring of open-ended short-answer questions in large-scale educational settings is time-consuming, while existing automated scoring methods suffer from poor generalization. Method: This paper proposes a novel paradigm for automatic short-answer scoring leveraging *near-domain data*—e.g., annotated responses to similar past questions—without requiring predefined rubrics or fine-tuning of large language models (LLMs). We formally define the task and introduce a lightweight, practical framework that integrates machine learning models with cross-question transfer strategies to efficiently exploit historical annotations. Contribution/Results: Experiments demonstrate that our approach significantly outperforms state-of-the-art (SOTA) automated scoring methods and non-fine-tuned LLMs (GPT-3.5, GPT-4, GPT-4o) across multiple standard metrics, achieving improvements of 10–20%. These results validate the effectiveness and deployability of near-domain data-driven approaches in educational automated scoring.

Technology Category

Application Category

📝 Abstract
Constructed-response questions are crucial to encourage generative processing and test a learner's understanding of core concepts. However, the limited availability of instructor time, large class sizes, and other resource constraints pose significant challenges in providing timely and detailed evaluation, which is crucial for a holistic educational experience. In addition, providing timely and frequent assessments is challenging since manual grading is labor intensive, and automated grading is complex to generalize to every possible response scenario. This paper proposes a novel and practical approach to grade short-answer constructed-response questions. We discuss why this problem is challenging, define the nature of questions on which our method works, and finally propose a framework that instructors can use to evaluate their students' open-responses, utilizing near-domain data like data from similar questions administered in previous years. The proposed method outperforms the state of the art machine learning models as well as non-fine-tuned large language models like GPT 3.5, GPT 4, and GPT 4o by a considerable margin of over 10-20% in some cases, even after providing the LLMs with reference/model answers. Our framework does not require pre-written grading rubrics and is designed explicitly with practical classroom settings in mind. Our results also reveal exciting insights about learning from near-domain data, including what we term as accuracy and data advantages using human-labeled data, and we believe this is the first work to formalize the problem of automated short answer grading based on the near-domain data.
Problem

Research questions and friction points this paper is trying to address.

Automates grading of short-answer questions for timely feedback
Utilizes near-domain data to improve accuracy without predefined rubrics
Outperforms existing models and LLMs by 10-20% in some cases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses near-domain data for grading accuracy
Outperforms state-of-the-art models without rubrics
Designed for practical classroom scalability
🔎 Similar Papers
No similar papers found.
S
Shyam Agarwal
Department of Computer Science, University of California, Davis, One Shields Ave, Davis, 95616, CA, USA.
A
Ali Moghimi
Department of Biological and Agricultural Engineering, University of California, Davis, One Shields Ave, Davis, 95616, CA, USA.
Kevin C. Haudek
Kevin C. Haudek
Associate Professor, Michigan State University
science educationassessment