🤖 AI Summary
This work addresses the Bangla Grammatical Error Explanation (BGEE) task—the first end-to-end prompt-tuning framework for BGEE. The method decomposes BGEE into three sequential stages: grammatical error detection and classification, sentence correction generation, and natural language explanation generation. Leveraging GPT-4, GPT-3.5 Turbo, and Llama-2-70b, it employs structured, task-decomposed prompting and human-in-the-loop evaluation. Its key contribution lies in jointly optimizing grammatical error discrimination accuracy, correction plausibility, and explanation comprehensibility—effectively mitigating error misattribution bias and explanation inaccuracy. Experimental results demonstrate that GPT-4 achieves a 5.26% improvement in F1 score and a 6.95% gain in exact match rate under automated evaluation; error type misclassification decreases by 25.51%, and erroneous explanation rate drops by 26.27%.
📝 Abstract
We propose a novel three-step prompt-tuning method for Bengali Grammatical Error Explanation (BGEE) using state-of-the-art large language models (LLMs) such as GPT-4, GPT-3.5 Turbo, and Llama-2-70b. Our approach involves identifying and categorizing grammatical errors in Bengali sentences, generating corrected versions of the sentences, and providing natural language explanations for each identified error. We evaluate the performance of our BGEE system using both automated evaluation metrics and human evaluation conducted by experienced Bengali language experts. Our proposed prompt-tuning approach shows that GPT-4, the best performing LLM, surpasses the baseline model in automated evaluation metrics, with a 5.26% improvement in F1 score and a 6.95% improvement in exact match. Furthermore, compared to the previous baseline, GPT-4 demonstrates a decrease of 25.51% in wrong error type and a decrease of 26.27% in wrong error explanation. However, the results still lag behind the human baseline.