Classifying German Language Proficiency Levels Using Large Language Models

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the automatic CEFR-level classification of German learner texts. Methodologically, it introduces a multi-source training paradigm integrating authentic annotated corpora with high-quality synthetic data. The approach combines prompt engineering, fine-tuning of LLaMA-3-8B-Instruct, and an interpretable probing technique grounded in internal model neural states to enable multi-granular modeling of linguistic competence features. Its key contribution is the first application of synthetic-data-driven representation probing to CEFR proficiency assessment—thereby simultaneously enhancing generalizability and interpretability. Experimental results demonstrate substantial improvements over existing state-of-the-art methods across multiple benchmarks, with significant accuracy gains. These findings validate the effectiveness and robustness of large language models in automated language proficiency evaluation.

Technology Category

Application Category

📝 Abstract
Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.
Problem

Research questions and friction points this paper is trying to address.

Classifying German texts by CEFR proficiency levels
Using LLMs for automated language assessment
Improving classification accuracy with diverse data and methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining multiple existing corpora with synthetic data
Evaluating prompt-engineering and fine-tuning of LLaMA-3-8B-Instruct
Using internal neural state probing for classification
E
Elias-Leander Ahlers
Computer Science Department, University of Münster, Münster, Germany
W
Witold Brunsmann
Computer Science Department, University of Münster, Münster, Germany
Malte Schilling
Malte Schilling
Professor, Autonomous Intelligent Systems Group
Machine LearningCognitionRoboticsNeural Networks