FEANEL: A Benchmark for Fine-Grained Error Analysis in K-12 English Writing

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing large language models (LLMs) exhibit limited capability in fine-grained error identification and pedagogically grounded feedback generation for K–12 English writing. Method: We introduce FEANEL—the first fine-grained English writing error analysis benchmark for foundational education—comprising 1,000 student essays. Grounded in a linguistics-informed, part-of-speech–aware error taxonomy co-designed by language education experts, we propose a multidimensional annotation framework covering error type, severity level, and interpretable, instructionally appropriate feedback. High-quality annotations were produced via expert human labeling under rigorous, multi-tiered annotation guidelines. Contribution/Results: We systematically evaluate leading LLMs across three core pedagogical dimensions: error localization, severity classification, and feedback generation. Results reveal substantial deficiencies—particularly in precise error localization and pedagogically sound feedback—underscoring the urgent need for education-specific model adaptation and optimization.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have transformed artificial intelligence, offering profound opportunities for educational applications. However, their ability to provide fine-grained educational feedback for K-12 English writing remains underexplored. In this paper, we challenge the error analysis and pedagogical skills of LLMs by introducing the problem of Fine-grained Error Analysis for English Learners and present the Fine-grained Error ANalysis for English Learners (FEANEL) Benchmark. The benchmark comprises 1,000 essays written by elementary and secondary school students, and a well-developed English writing error taxonomy. Each error is annotated by language education experts and categorized by type, severity, and explanatory feedback, using a part-of-speech-based taxonomy they co-developed. We evaluate state-of-the-art LLMs on the FEANEL Benchmark to explore their error analysis and pedagogical abilities. Experimental results reveal significant gaps in current LLMs' ability to perform fine-grained error analysis, highlighting the need for advancements in particular methods for educational applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' fine-grained error analysis in K-12 English writing

Assesses pedagogical feedback capabilities of LLMs for student essays

Identifies gaps in LLMs' performance on detailed error categorization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FEANEL benchmark for fine-grained error analysis

Uses expert-annotated essays with part-of-speech error taxonomy

Evaluates LLMs to identify gaps in educational feedback methods

🔎 Similar Papers

No similar papers found.

Authors to Follow