Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that autoregressive models struggle to simultaneously satisfy multiple scoring dimensions in holistic essay assessment. To this end, the authors propose the TAPO framework, which explicitly incorporates multidimensional scoring objectives into the policy optimization process for the first time. TAPO decomposes reward signals across both samples and scoring dimensions and integrates them with supervised fine-tuning enhanced by improved prompting, thereby effectively modeling global coherence, dimensional accuracy, format validity, and inter-dimensional dependencies. Experimental results demonstrate that TAPO significantly outperforms standard supervised fine-tuning and scalar-reward-based optimization baselines across multiple backbone language models, achieving substantial gains in both multidimensional scoring performance and generalization capability.
📝 Abstract
Multi-trait essay scoring aims to provide fine-grained evaluation of writing quality across multiple dimensions. However, how to effectively post-train autoregressive scoring models remains underexplored. In this paper, we propose Trait-Aware Policy Optimization (TAPO), a post-training framework tailored to autoregressive multi-trait scoring. Our method decomposes rewards along both the sample and trait dimensions, combining global scoring consistency, trait-level accuracy, format validity, and inter-trait dependency preservation. In addition, we enhance supervised fine-tuning with enhanced prompts, allowing the model to internalize trait semantics before preference optimization. Experiments across multiple backbone models show that our method consistently improves multi-trait scoring performance over supervised fine-tuning and scalar-reward optimization baselines, demonstrating the effectiveness and transferability of trait-aware post-training for essay scoring.
Problem

Research questions and friction points this paper is trying to address.

multi-trait essay scoring
autoregressive scoring models
post-training
trait-aware evaluation
fine-grained writing assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trait-Aware Policy Optimization
autoregressive scoring
multi-trait essay scoring
reward decomposition
enhanced prompting
🔎 Similar Papers
No similar papers found.