Time Is Effort: Estimating Human Post-Editing Time for Grammar Error Correction Tool Evaluation

๐Ÿ“… 2025-10-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the lack of human-centered efficiency evaluation for grammatical error correction (GEC) tools by proposing **post-editing time (PET)** as a core metric to quantify labor cost savings. Methodologically, we introduce the first large-scale **PET-annotated dataset** and propose **Post-Editing Effort Time (PEET)**โ€”a time-aware scoring metric derived via human annotation, edit-type modeling, and statistical regression to correct for confounding factors. Correlation analysis confirms PEETโ€™s high alignment with human judgments. Our contributions are threefold: (1) establishing the first time-aware GEC evaluation benchmark; (2) enabling practical ranking of GEC tools by actual time savings; and (3) shifting the evaluation paradigm from system-centric performance metrics toward human-perceived efficacy. Experiments on BEA19 and CoNLL-2014 demonstrate strong agreement between PEET and human rankings (Spearmanโ€™s ฯ > 0.9), robustly quantifying differential time savings across tools.

Technology Category

Application Category

๐Ÿ“ Abstract
Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing (PE) time annotations and corrections for two English GEC test datasets (BEA19 and CoNLL14). We introduce Post-Editing Effort in Time (PEET) for GEC Tools as a human-focused evaluation scorer to rank any GEC Tool by estimating PE time-to-correct. Using our dataset, we quantify the amount of time saved by GEC Tools in text editing. Analyzing the edit type indicated that determining whether a sentence needs correction and edits like paraphrasing and punctuation changes had the greatest impact on PE time. Finally, comparison with human rankings shows that PEET correlates well with technical effort judgment, providing a new human-centric direction for evaluating GEC tool usability. We release our dataset and code at: https://github.com/ankitvad/PEET_Scorer.
Problem

Research questions and friction points this paper is trying to address.

Quantifying human post-editing time for grammar error correction tools
Evaluating GEC tool usability through time-saving estimation metrics
Analyzing edit types impacting human correction effort in text editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduced PEET scorer for GEC tool evaluation
Created large-scale dataset with post-editing time annotations
Quantified time savings by analyzing edit type impact
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Ankit Vadehra
University of Waterloo, Vector Institute
B
Bill Johnson
Scribendi Inc.
G
Gene Saunders
Scribendi Inc.
Pascal Poupart
Pascal Poupart
University of Waterloo
Artificial IntelligenceMachine LearningReinforcement LearningFederated LearningNLP