Time Is Effort: Estimating Human Post-Editing Time for Grammar Error Correction Tool Evaluation

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study addresses the lack of human-centered efficiency evaluation for grammatical error correction (GEC) tools by proposing **post-editing time (PET)** as a core metric to quantify labor cost savings. Methodologically, we introduce the first large-scale **PET-annotated dataset** and propose **Post-Editing Effort Time (PEET)**—a time-aware scoring metric derived via human annotation, edit-type modeling, and statistical regression to correct for confounding factors. Correlation analysis confirms PEET’s high alignment with human judgments. Our contributions are threefold: (1) establishing the first time-aware GEC evaluation benchmark; (2) enabling practical ranking of GEC tools by actual time savings; and (3) shifting the evaluation paradigm from system-centric performance metrics toward human-perceived efficacy. Experiments on BEA19 and CoNLL-2014 demonstrate strong agreement between PEET and human rankings (Spearman’s ρ > 0.9), robustly quantifying differential time savings across tools.

Technology Category

Application Category

📝 Abstract

Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing (PE) time annotations and corrections for two English GEC test datasets (BEA19 and CoNLL14). We introduce Post-Editing Effort in Time (PEET) for GEC Tools as a human-focused evaluation scorer to rank any GEC Tool by estimating PE time-to-correct. Using our dataset, we quantify the amount of time saved by GEC Tools in text editing. Analyzing the edit type indicated that determining whether a sentence needs correction and edits like paraphrasing and punctuation changes had the greatest impact on PE time. Finally, comparison with human rankings shows that PEET correlates well with technical effort judgment, providing a new human-centric direction for evaluating GEC tool usability. We release our dataset and code at: https://github.com/ankitvad/PEET_Scorer.

Problem

Research questions and friction points this paper is trying to address.

Quantifying human post-editing time for grammar error correction tools

Evaluating GEC tool usability through time-saving estimation metrics

Analyzing edit types impacting human correction effort in text editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduced PEET scorer for GEC tool evaluation

Created large-scale dataset with post-editing time annotations

Quantified time savings by analyzing edit type impact

🔎 Similar Papers

CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction