Misconception Diagnosis From Student-Tutor Dialogue: Generate, Retrieve, Rerank

๐Ÿ“… 2026-02-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the challenge of timely and accurate identification of studentsโ€™ misconceptions in educational dialogues by proposing a novel three-stage approach that integrates generation, retrieval, and re-ranking. The method first leverages a large language model to generate potential misconceptions, then retrieves a candidate set based on embedding similarity, and finally employs a fine-tuned small model to re-rank candidates for improved relevance. To the best of our knowledge, this work is the first to synergistically apply these techniques to misconception diagnosis. Evaluated on real-world instructional dialogue data, the approach significantly outperforms baseline models. Experimental results demonstrate that fine-tuning effectively enhances generation quality, ablation studies confirm the necessity of each component, and the fine-tuned small model surpasses larger closed-source counterparts, highlighting the methodโ€™s efficiency and practicality.

Technology Category

Application Category

๐Ÿ“ Abstract
Timely and accurate identification of student misconceptions is key to improving learning outcomes and pre-empting the compounding of student errors. However, this task is highly dependent on the effort and intuition of the teacher. In this work, we present a novel approach for detecting misconceptions from student-tutor dialogues using large language models (LLMs). First, we use a fine-tuned LLM to generate plausible misconceptions, and then retrieve the most promising candidates among these using embedding similarity with the input dialogue. These candidates are then assessed and re-ranked by another fine-tuned LLM to improve misconception relevance. Empirically, we evaluate our system on real dialogues from an educational tutoring platform. We consider multiple base LLM models including LLaMA, Qwen and Claude on zero-shot and fine-tuned settings. We find that our approach improves predictive performance over baseline models and that fine-tuning improves both generated misconception quality and can outperform larger closed-source models. Finally, we conduct ablation studies to both validate the importance of our generation and reranking steps on misconception generation quality.
Problem

Research questions and friction points this paper is trying to address.

misconception diagnosis
student-tutor dialogue
large language models
educational AI
error identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

misconception diagnosis
large language models
dialogue-based learning
fine-tuning
reranking
๐Ÿ”Ž Similar Papers
No similar papers found.