Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the feasibility of large language models (LLMs) as adaptive second-language tutors, specifically whether system prompts can reliably control output text difficulty to match learners’ CEFR levels (A1/B1/C1). Method: Using open-source 7B–12B-parameter LLMs, we design a dual-role alternating dialogue framework with isolated history management and integrate automated CEFR-based difficulty assessment. Contribution/Results: We introduce the novel concept of “alignment drift,” empirically demonstrating that while prompts temporarily constrain output difficulty, their efficacy degrades significantly over dialogue turns—rendering pure prompting insufficient for sustained personalized instruction. We establish a low-overhead, fully automated evaluation paradigm with quantified error <8.2%, revealing fundamental limitations of prompt engineering for long-term adaptive language teaching. These findings advocate a paradigm shift toward dynamic, context-aware difficulty regulation mechanisms in LLM-based educational systems.

Technology Category

Application Category

📝 Abstract
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student roles with separate chat histories. The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels (A1, B1, C1). Our findings suggest that while system prompting can be used to constrain model outputs, prompting alone is too brittle for sustained, long-term interactional contexts - a phenomenon we term alignment drift. Our results provide insights into the feasibility of LLMs for personalized, proficiency-aligned adaptive tutors and provide a scalable method for low-cost evaluation of model performance without human participants.
Problem

Research questions and friction points this paper is trying to address.

Evaluating CEFR-based prompting for controlling LLM-generated text difficulty
Assessing alignment drift in LLMs during sustained teacher-student dialogues
Exploring LLMs as adaptive tutors for Spanish language learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CEFR-based prompting for text difficulty control
Simulates dialogues with alternating tutor-student LLM roles
Evaluates alignment drift in long-term interaction contexts
🔎 Similar Papers
No similar papers found.