🤖 AI Summary
This study addresses the automatic CEFR-level classification of German learner texts. Methodologically, it introduces a multi-source training paradigm integrating authentic annotated corpora with high-quality synthetic data. The approach combines prompt engineering, fine-tuning of LLaMA-3-8B-Instruct, and an interpretable probing technique grounded in internal model neural states to enable multi-granular modeling of linguistic competence features. Its key contribution is the first application of synthetic-data-driven representation probing to CEFR proficiency assessment—thereby simultaneously enhancing generalizability and interpretability. Experimental results demonstrate substantial improvements over existing state-of-the-art methods across multiple benchmarks, with significant accuracy gains. These findings validate the effectiveness and robustness of large language models in automated language proficiency evaluation.
📝 Abstract
Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.