CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) exhibit three critical limitations in educational settings: premature answer disclosure, poor adaptability to students’ cognitive uncertainty, and vulnerability to affective manipulative prompts. To address these issues, we propose CoDAE—a pedagogically grounded framework that synthesizes chain-of-thought (CoT) prompting with human-annotated, authentic student dialogues to perform education-specific data augmentation, yielding targeted instructional exemplars. CoDAE fine-tunes open-source LLMs on this enhanced dataset and employs a hybrid evaluation protocol combining automated metrics and LLM-as-a-judge assessment. Experimental results demonstrate significant improvements in pedagogical reasoning guidance, teaching alignment, and robustness against adversarial prompts. CoDAE effectively mitigates answer leakage and malicious response induction, with consistent gains across multiple base models—validating its generalizability. This work provides a scalable, trustworthy methodology for adaptive, student-centered AI tutoring.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly employed as AI tutors due to their scalability and potential for personalized instruction. However, off-the-shelf LLMs often underperform in educational settings: they frequently reveal answers too readily, fail to adapt their responses to student uncertainty, and remain vulnerable to emotionally manipulative prompts. To address these challenges, we introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought (CoT) data augmentation. We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance. Furthermore, we design targeted dialogue cases to explicitly mitigate three key limitations: over-compliance, low response adaptivity, and threat vulnerability. We fine-tune four open-source LLMs on different variants of the augmented datasets and evaluate them in simulated educational scenarios using both automatic metrics and LLM-as-a-judge assessments. Our results show that models fine-tuned with CoDAE deliver more pedagogically appropriate guidance, better support reasoning processes, and effectively resist premature answer disclosure.
Problem

Research questions and friction points this paper is trying to address.

LLMs underperform as tutors by revealing answers too quickly
LLMs fail to adapt responses to student uncertainty
LLMs are vulnerable to emotionally manipulative prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought data augmentation for LLMs
Fine-tuning with pedagogically enriched dialogues
Mitigating over-compliance and threat vulnerability
🔎 Similar Papers
No similar papers found.