FEANEL: A Benchmark for Fine-Grained Error Analysis in K-12 English Writing

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) exhibit limited capability in fine-grained error identification and pedagogically grounded feedback generation for K–12 English writing. Method: We introduce FEANEL—the first fine-grained English writing error analysis benchmark for foundational education—comprising 1,000 student essays. Grounded in a linguistics-informed, part-of-speech–aware error taxonomy co-designed by language education experts, we propose a multidimensional annotation framework covering error type, severity level, and interpretable, instructionally appropriate feedback. High-quality annotations were produced via expert human labeling under rigorous, multi-tiered annotation guidelines. Contribution/Results: We systematically evaluate leading LLMs across three core pedagogical dimensions: error localization, severity classification, and feedback generation. Results reveal substantial deficiencies—particularly in precise error localization and pedagogically sound feedback—underscoring the urgent need for education-specific model adaptation and optimization.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have transformed artificial intelligence, offering profound opportunities for educational applications. However, their ability to provide fine-grained educational feedback for K-12 English writing remains underexplored. In this paper, we challenge the error analysis and pedagogical skills of LLMs by introducing the problem of Fine-grained Error Analysis for English Learners and present the Fine-grained Error ANalysis for English Learners (FEANEL) Benchmark. The benchmark comprises 1,000 essays written by elementary and secondary school students, and a well-developed English writing error taxonomy. Each error is annotated by language education experts and categorized by type, severity, and explanatory feedback, using a part-of-speech-based taxonomy they co-developed. We evaluate state-of-the-art LLMs on the FEANEL Benchmark to explore their error analysis and pedagogical abilities. Experimental results reveal significant gaps in current LLMs' ability to perform fine-grained error analysis, highlighting the need for advancements in particular methods for educational applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' fine-grained error analysis in K-12 English writing
Assesses pedagogical feedback capabilities of LLMs for student essays
Identifies gaps in LLMs' performance on detailed error categorization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FEANEL benchmark for fine-grained error analysis
Uses expert-annotated essays with part-of-speech error taxonomy
Evaluates LLMs to identify gaps in educational feedback methods
🔎 Similar Papers
No similar papers found.
J
Jingheng Ye
Squirrel Ai Learning, Tsinghua University
S
Shen Wang
Squirrel Ai Learning
J
Jiaqi Chen
Tsinghua University
H
Hebin Wang
Tsinghua University
D
Deqing Zou
Tsinghua University
Y
Yanyu Zhu
Tsinghua University
Jiwei Tang
Jiwei Tang
Tsinghua University
Natural Language ProcessingLarge Language Model
H
Hai-Tao Zheng
Tsinghua University
R
Ruitong Liu
Tsinghua University
H
Haoyang Li
Squirrel Ai Learning
Yanfeng Wang
Yanfeng Wang
Shanghai Jiao Tong University
Q
Qingsong Wen
Squirrel Ai Learning