LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capability of large language models (LLMs) to perform multidimensional analytic assessment of second-language (L2) graduate students’ academic English writing—specifically, whether LLMs can simultaneously generate reliable scores and explanatory feedback across nine predefined criteria. Method: We propose an interpretable, low-cost, scalable, and reproducible automated assessment framework that replaces labor-intensive human scoring. Our approach integrates multi-prompt strategies with state-of-the-art LLMs, a custom-built L2 academic writing corpus, expert-derived multidimensional annotation guidelines, and rule-based enhancement mechanisms. Contribution/Results: Experimental results demonstrate that LLM-generated scores and feedback are overall reasonable, stable, and interpretable. This work constitutes the first systematic validation of LLMs’ reliability and validity in multidimensional writing assessment. To ensure full reproducibility, we publicly release the annotated corpus, annotation specifications, and evaluation code.

Technology Category

Application Category

📝 Abstract
The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus for reproducibility.
Problem

Research questions and friction points this paper is trying to address.

LLMs assess L2 academic writing
Evaluate multi-dimensional writing criteria
Develop scalable feedback evaluation framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs perform multi-dimensional assessments
Novel feedback comment quality framework
Corpus released for reproducibility
🔎 Similar Papers
No similar papers found.
Z
Zhengxiang Wang
Department of Linguistics & IACS, Stony Brook University, USA
V
Veronika Makarova
Department of Linguistics, University of Saskatchewan, Canada
Z
Zhi Li
Department of Linguistics, University of Saskatchewan, Canada
Jordan Kodner
Jordan Kodner
Stony Brook University
Computational LinguisticsLanguage AcquisitionLanguage Change
Owen Rambow
Owen Rambow
Stony Brook University
Natural Language ProcessingComputational LinguisticsComputational Social Science