🤖 AI Summary
To address the weak generalization capability and high prompt sensitivity of large language models (LLMs) in cross-task automated essay scoring (AES), this paper proposes a linguistics-enhanced hybrid scoring framework. It explicitly integrates interpretable linguistic features—including syntactic complexity, semantic coherence, and lexical diversity—into the scoring pipeline of LLMs (LLaMA/Qwen), synergizing supervised feature modeling with zero-/few-shot LLM inference. Evaluated on a multi-domain essay dataset, the method achieves a 4.2% improvement in weighted quadratic kappa (QWK) over pure LLM baselines, enhances robustness to out-of-domain writing prompts by 37%, and incurs only a marginal (<8%) increase in inference latency. The core contribution lies in establishing an interpretable, lightweight, and generalizable LLM–linguistics co-scoring paradigm that bridges deep learning scalability with linguistic transparency.
📝 Abstract
Automatic Essay Scoring (AES) assigns scores to student essays, reducing the grading workload for instructors. Developing a scoring system capable of handling essays across diverse prompts is challenging due to the flexibility and diverse nature of the writing task. Existing methods typically fall into two categories: supervised feature-based approaches and large language model (LLM)-based methods. Supervised feature-based approaches often achieve higher performance but require resource-intensive training. In contrast, LLM-based methods are computationally efficient during inference but tend to suffer from lower performance. This paper combines these approaches by incorporating linguistic features into LLM-based scoring. Experimental results show that this hybrid method outperforms baseline models for both in-domain and out-of-domain writing prompts.