Training data generation for context-dependent rubric-based short answer grading

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges in automated scoring of constructed-response items in PISA assessments, where human scoring is susceptible to linguistic variation and rater bias, while existing automated approaches are hindered by the scarcity of domain-specific training data. To overcome this limitation, the authors propose a method that leverages only a small amount of confidential reference data, combining rule-based text transformations with prompt engineering to generate high-quality, contextually relevant synthetic training data. This approach enhances data utility while preserving privacy. Three synthetic datasets were constructed, exhibiting surface-level characteristics closely aligned with the original data. Preliminary experiments demonstrate that one of the derived formats significantly improves the performance of automated scoring models during training.
📝 Abstract
Every 4 years, the PISA test is administered by the OECD to test the knowledge of teenage students worldwide and allow for comparisons of educational systems. However, having to avoid language differences and annotator bias makes the grading of student answers challenging. For these reasons, it would be interesting to compare methods of automatic student answer grading. To train some of these methods, which require machine learning, or to compute parameters or select hyperparameters for those that do not, a large amount of domain-specific data is needed. In this work, we explore a small number of methods for creating a large-scale training dataset using only a relatively small confidential dataset as a reference, leveraging a set of very simple derived text formats to preserve confidentiality. Using these methods, we successfully created three surrogate datasets that are, at the very least, superficially more similar to the reference dataset than purely the result of prompt-based generation. Early experiments suggest one of these approaches might also lead to improved model training.
Problem

Research questions and friction points this paper is trying to address.

automatic grading
training data generation
PISA
rubric-based scoring
data confidentiality
Innovation

Methods, ideas, or system contributions that make the work stand out.

training data generation
automatic short answer grading
data confidentiality
surrogate dataset
PISA
🔎 Similar Papers
No similar papers found.
Pavel Šindelář
Pavel Šindelář
Master's Student, Charles University
natural language processing
D
Dávid Slivka
Faculty of Mathematics and Physics at Charles University
C
Christopher Bouma
Faculty of Mathematics and Physics at Charles University
F
Filip Prášil
Faculty of Mathematics and Physics at Charles University
Ondřej Bojar
Ondřej Bojar
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics
machine translationspeech translationparsingtreebanking