Enhancing Automated Essay Scoring with Three Techniques: Two-Stage Fine-Tuning, Score Alignment, and Self-Training

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the critical challenge of scarce labeled data in real-world automated essay scoring (AES) scenarios, which severely limits model performance. To overcome this bottleneck, the authors propose a novel three-stage approach that integrates low-rank adaptation fine-tuning, score distribution alignment, and uncertainty-guided self-training—unifying two-stage fine-tuning, prediction-to-ground-truth score alignment, and uncertainty-aware pseudo-labeling for the first time. Built upon the DualBERT architecture, the method achieves 91.2% of the full-data (approximately 1,000 essays) performance using only 32 annotated samples. Moreover, under the full-data setting, the proposed score alignment strategy sets a new state-of-the-art result, significantly alleviating the data scarcity problem in low-resource AES.

Technology Category

Application Category

📝 Abstract

Automated Essay Scoring (AES) plays a crucial role in education by providing scalable and efficient assessment tools. However, in real-world settings, the extreme scarcity of labeled data severely limits the development and practical adoption of robust AES systems. This study proposes a novel approach to enhance AES performance in both limited-data and full-data settings by introducing three key techniques. First, we introduce a Two-Stage fine-tuning strategy that leverages low-rank adaptations to better adapt an AES model to target prompt essays. Second, we introduce a Score Alignment technique to improve consistency between predicted and true score distributions. Third, we employ uncertainty-aware self-training using unlabeled data, effectively expanding the training set with pseudo-labeled samples while mitigating label noise propagation. We implement above three key techniques on DualBERT. We conduct extensive experiments on the ASAP++ dataset. As a result, in the 32-data setting, all three key techniques improve performance, and their integration achieves 91.2% of the full-data performance trained on approximately 1,000 labeled samples. In addition, the proposed Score Alignment technique consistently improves performance in both limited-data and full-data settings: e.g., it achieves state-of-the-art results in the full-data setting when integrated into DualBERT.

Problem

Research questions and friction points this paper is trying to address.

Automated Essay Scoring

labeled data scarcity

low-resource setting

score consistency

educational assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-Stage Fine-Tuning

Score Alignment

Self-Training