Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses key challenges in identifying students’ misconceptions—namely data scarcity, high annotation noise, pretraining biases and deployment difficulties of large models, and overfitting tendencies of small models—by proposing a two-stage knowledge distillation framework. In the first stage, task-specific capabilities are transferred from a teacher model to a compact student model. The second stage introduces a dual-level margin-based sample selection mechanism grounded in cognitive uncertainty to identify four categories of critical samples, coupled with a difficulty-adaptive strategy that dynamically blends hard and soft labels to enhance discrimination of ambiguous error types. Using only 10.30% of high-value samples, the method achieves a MAP@3 of 0.9585 (+17.8%) and 84.38% accuracy on cross-topic algebra misconception classification in middle school, substantially outperforming state-of-the-art large language models (67.73%) and even a fine-tuned 72B-parameter model (81.25%).

📝 Abstract

Accurately identifying student misconceptions is crucial for personalized education but faces three challenges: (1) data scarcity with long-tail distribution, where authentic student reasoning is difficult to synthesize; (2) fuzzy boundaries between error categories with high annotation noise; (3) deployment parado-large models overlook unconventional approaches due to pretraining bias and cannot be deployed on edge, while small models overfit to noise. Unlike traditional methods that increase diversity through large-scale data synthesis, we propose a two-stage knowledge distillation framework that mines high-value samples from existing data. The first stage performs standard distillation to transfer task capabilities. The second stage introduces a dual-layer marginal selection mechanism based on cognitive uncertainty, identifying four types of critical samples based on teacher model uncertainty and confidence differences. For different data subsets, we design difficulty-adaptive mechanism to balance hard/soft label contributions, enabling student models to inherit inter-class relationships from teacher soft labels while distinguishing ambiguous error types. Experiments show that with augmented training on only 10.30% of filtered samples, we achieve MAP@3 of 0.9585 (+17.8%) on the MAP-Charting dataset, and using only a 4B parameter model, we attain 84.38% accuracy on cross-topic tests of middle school algebra misconception benchmarks, significantly outperforming sota LLM (67.73%) and standard fine-tuned 72B models (81.25%). Our code is available at https://github.com/RoschildRui/acl2026_map.

Problem

Research questions and friction points this paper is trying to address.

student misconceptions

data scarcity

annotation noise

model deployment

personalized education

Innovation

Methods, ideas, or system contributions that make the work stand out.

cognitive uncertainty

knowledge distillation

misconception classification