Classifying German Language Proficiency Levels Using Large Language Models

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study addresses the automatic CEFR-level classification of German learner texts. Methodologically, it introduces a multi-source training paradigm integrating authentic annotated corpora with high-quality synthetic data. The approach combines prompt engineering, fine-tuning of LLaMA-3-8B-Instruct, and an interpretable probing technique grounded in internal model neural states to enable multi-granular modeling of linguistic competence features. Its key contribution is the first application of synthetic-data-driven representation probing to CEFR proficiency assessment—thereby simultaneously enhancing generalizability and interpretability. Experimental results demonstrate substantial improvements over existing state-of-the-art methods across multiple benchmarks, with significant accuracy gains. These findings validate the effectiveness and robustness of large language models in automated language proficiency evaluation.

Technology Category

Application Category

📝 Abstract

Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.

Problem

Research questions and friction points this paper is trying to address.

Classifying German texts by CEFR proficiency levels

Using LLMs for automated language assessment

Improving classification accuracy with diverse data and methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining multiple existing corpora with synthetic data

Evaluating prompt-engineering and fine-tuning of LLaMA-3-8B-Instruct

Using internal neural state probing for classification

🔎 Similar Papers

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis