Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation

📅 2024-12-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Automated assessment of multiple-choice question (MCQ) difficulty in educational settings is costly and lacks robustness with existing methods. Method: This paper proposes a novel paradigm leveraging intrinsic uncertainty of large language models (LLMs): it quantifies confidence fluctuations and answer disagreements—measured via entropy, variance, and consistency—across multi-LLM collaborative reasoning as key proxy signals for question difficulty, and fuses them with question-stem text embeddings to train a random forest regression model. Contribution/Results: It is the first work to systematically model LLM cognitive uncertainty as an interpretable, annotation-free difficulty indicator, bypassing reliance on human labels or shallow textual features. Evaluated on USMLE and CMCQRD datasets, it achieves state-of-the-art performance; uncertainty-aware features significantly improve prediction accuracy, and estimated difficulty exhibits strong inverse correlation with empirical student pass rates (r < −0.85).

Technology Category

Application Category

📝 Abstract

In an educational setting, an estimate of the difficulty of multiple-choice questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are investigated, yielding however mixed success until now. Our approach to this problem takes a different angle from previous work: asking various Large Language Models to tackle the questions included in three different MCQ datasets, we leverage model uncertainty to estimate item difficulty. By using both model uncertainty features as well as textual features in a Random Forest regressor, we show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question. In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the USMLE and CMCQRD publicly available datasets.

Problem

Research questions and friction points this paper is trying to address.

Estimating difficulty of multiple-choice questions automatically

Leveraging model uncertainty for question difficulty prediction

Improving accuracy using uncertainty and textual features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging model uncertainty for question difficulty

Combining uncertainty and textual features in regressor

Achieving state-of-the-art results on benchmark datasets

🔎 Similar Papers

Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?