🤖 AI Summary
This study addresses the challenge of accurately assessing second language learners’ grammatical proficiency and delivering fine-grained formative feedback. Building upon the English Grammar Profile (EGP), we propose a novel analytical framework that integrates learners’ original and corrected sentences, marking the first integration of EGP with large language models (LLMs) to automatically identify learners’ attempts at—and successful use of—grammatical structures aligned with CEFR levels. Methodologically, we introduce a hybrid rule-LLM paradigm that combines a rule-based system, an LLM classifier, and an automatic grammatical error correction module to construct both semi-automatic and fully automatic analysis pipelines. Experimental results demonstrate that LLMs outperform rule-based approaches on semantically and pragmatically complex constructions, that the hybrid pipeline achieves the best overall CEFR-level prediction performance, and that the fully automatic pipeline approaches human-level accuracy in detecting successful grammatical attempts.
📝 Abstract
Evaluating the grammatical competence of second language (L2) learners is essential both for providing targeted feedback and for assessing proficiency. To achieve this, we propose a novel framework leveraging the English Grammar Profile (EGP), a taxonomy of grammatical constructs mapped to the proficiency levels of the Common European Framework of Reference (CEFR), to detect learners' attempts at grammatical constructs and classify them as successful or unsuccessful. This detection can then be used to provide fine-grained feedback. Moreover, the grammatical constructs are used as predictors of proficiency assessment by using automatically detected attempts as predictors of holistic CEFR proficiency. For the selection of grammatical constructs derived from the EGP, rule-based and LLM-based classifiers are compared. We show that LLMs outperform rule-based methods on semantically and pragmatically nuanced constructs, while rule-based approaches remain competitive for constructs that rely purely on morphological or syntactic features and do not require semantic interpretation. For proficiency assessment, we evaluate both rule-based and hybrid pipelines and show that a hybrid approach combining a rule-based pre-filter with an LLM consistently yields the strongest performance. Since our framework operates on pairs of original learner sentences and their corrected counterparts, we also evaluate a fully automated pipeline using automatic grammatical error correction. This pipeline closely approaches the performance of semi-automated systems based on manual corrections, particularly for the detection of successful attempts at grammatical constructs. Overall, our framework emphasises learners' successful attempts in addition to unsuccessful ones, enabling positive, formative feedback and providing actionable insights into grammatical development.