🤖 AI Summary
This study addresses the fine-grained challenges in assessing lexical usage in second-language (L2) writing—particularly polysemy, contextual variation, and multiword expressions—by proposing a context-aware, word-level evaluation method integrating large language models (LLMs) with the English Vocabulary Profile (EVP). It is the first work to apply EVP for dynamic, sentence-level lexical proficiency annotation, overcoming the limitations of traditional part-of-speech (POS)-based static assessments. The method achieves precise CEFR-level alignment for individual words within authentic sentences and establishes a moderate correlation (r ≈ 0.56) between word-level proficiency scores and overall text-level CEFR levels. Experiments demonstrate that the LLM+EVP model significantly outperforms POS-based baselines; moreover, EVP levels exhibit strong consistency and cross-corpus transferability in real-world L2 writing data. This work introduces a novel paradigm for automated, interpretable, and context-sensitive L2 lexical assessment.
📝 Abstract
Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a novel approach to enable fine-grained vocabulary evaluation exploiting the precise use of words within a sentence. The scheme combines large language models (LLMs) with the English Vocabulary Profile (EVP). The EVP is a standard lexical resource that enables in-context vocabulary use to be linked with proficiency level. We evaluate the ability of LLMs to assign proficiency levels to individual words as they appear in L2 learner writing, addressing key challenges such as polysemy, contextual variation, and multi-word expressions. We compare LLMs to a PoS-based baseline. LLMs appear to exploit additional semantic information that yields improved performance. We also explore correlations between word-level proficiency and essay-level proficiency. Finally, the approach is applied to examine the consistency of the EVP proficiency levels. Results show that LLMs are well-suited for the task of vocabulary assessment.