Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit cognitive alignment with students in multiple-choice question (MCQ) distractor selection—specifically, whether LLMs’ predicted probabilities of generating incorrect options (distractors) correlate with empirically observed student misselection frequencies. Method: Leveraging real-world educational datasets, we conduct probability distribution correlation analysis (Spearman’s ρ), cross-model comparison, and distractor-level alignment evaluation. Contribution/Results: We provide the first empirical evidence that, when LLMs err, they significantly prefer the most frequently selected student distractors—a robust cognitive bias alignment observed even in smaller-scale models. LLM-generated distractor probabilities show moderate positive correlation with student misselection distributions (ρ ≈ 0.4–0.6), confirming cross-scale consistency between model-generated and human misconceptions. These findings establish a novel, empirically grounded paradigm for automated, cognitively plausible distractor generation in educational assessment.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in various educational tasks, yet their alignment with human learning patterns, particularly in predicting which incorrect options students are most likely to select in multiple-choice questions (MCQs), remains underexplored. Our work investigates the relationship between LLM generation likelihood and student response distributions in MCQs with a specific focus on distractor selections. We collect a comprehensive dataset of MCQs with real-world student response distributions to explore two fundamental research questions: (1). RQ1 - Do the distractors that students more frequently select correspond to those that LLMs assign higher generation likelihood to? (2). RQ2 - When an LLM selects a incorrect choice, does it choose the same distractor that most students pick? Our experiments reveals moderate correlations between LLM-assigned probabilities and student selection patterns for distractors in MCQs. Additionally, when LLMs make mistakes, they are more likley to select the same incorrect answers that commonly mislead students, which is a pattern consistent across both small and large language models. Our work provides empirical evidence that despite LLMs' strong performance on generating educational content, there remains a gap between LLM's underlying reasoning process and human cognitive processes in identifying confusing distractors. Our findings also have significant implications for educational assessment development. The smaller language models could be efficiently utilized for automated distractor generation as they demonstrate similar patterns in identifying confusing answer choices as larger language models. This observed alignment between LLMs and student misconception patterns opens new opportunities for generating high-quality distractors that complement traditional human-designed distractors.
Problem

Research questions and friction points this paper is trying to address.

Explores LLM alignment with human error patterns
Investigates LLM and student distractor selection correlation
Assesses LLM capability in educational assessment tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs analyze student error patterns
Correlate LLM mistakes with student misconceptions
Use small models for distractor generation
🔎 Similar Papers
No similar papers found.