DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing

📅 2024-02-21
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Current automatic essay scoring (AES) models for English-as-a-foreign-language (EFL) writing instruction face two critical bottlenecks: (1) scarcity of datasets aligned with authentic pedagogical contexts, and (2) reliance on holistic scoring, which impedes fine-grained, actionable feedback. To address these, we introduce DREsS—the first fine-grained, rubric-driven AES dataset tailored for EFL instruction—comprising 42.4K essays drawn from real classrooms, standardized tests, and synthetically augmented sources. We further propose the first multidimensional, pedagogically grounded human annotation rubric specifically designed for EFL teaching needs. Additionally, we present CASE—a text-contamination-aware data augmentation strategy leveraging semantic and syntactic perturbations to enhance lexical and structural diversity. Empirical evaluation demonstrates that CASE improves baseline model performance by 45.44% in fine-grained scoring accuracy. This work establishes a high-quality data foundation and methodological framework for developing practical, interpretable, fine-grained AES systems in EFL education.

Technology Category

Application Category

📝 Abstract
Automated essay scoring (AES) is a useful tool in English as a Foreign Language (EFL) writing education, offering real-time essay scores for students and instructors. However, previous AES models were trained on essays and scores irrelevant to the practical scenarios of EFL writing education and usually provided a single holistic score due to the lack of appropriate datasets. In this paper, we release DREsS, a large-scale, standard dataset for rubric-based automated essay scoring. DREsS comprises three sub-datasets: DREsS_New, DREsS_Std., and DREsS_CASE. We collect DREsS_New, a real-classroom dataset with 2.3K essays authored by EFL undergraduate students and scored by English education experts. We also standardize existing rubric-based essay scoring datasets as DREsS_Std. We suggest CASE, a corruption-based augmentation strategy for essays, which generates 40.1K synthetic samples of DREsS_CASE and improves the baseline results by 45.44%. DREsS will enable further research to provide a more accurate and practical AES system for EFL writing education.
Problem

Research questions and friction points this paper is trying to address.

Lack of appropriate datasets for rubric-based AES in EFL education
Previous AES models provide only holistic scores, not rubric-based
Need for accurate and practical AES systems in EFL writing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale rubric-based dataset DREsS
Real-classroom essays scored by experts
Corruption-based augmentation strategy CASE
🔎 Similar Papers
No similar papers found.