SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Traditional item difficulty estimation relies on real student response data to fit Item Response Theory (IRT) models, incurring high data collection costs and failing to address the cold-start problem for newly introduced open-ended items. Method: This paper proposes SMART, the first framework enabling cold-start difficulty prediction for open-ended items without requiring real responses. SMART leverages large language models (LLMs) to synthesize controllable, IRT-aligned artificial students; calibrates their ability distribution via Direct Preference Optimization (DPO); and infers item difficulty by generating synthetic responses and fitting IRT models thereto. Contribution/Results: Experiments on real student datasets demonstrate that SMART significantly outperforms existing methods across prediction accuracy, generalizability, and scalability. It establishes a novel, efficient, and robust paradigm for item difficulty estimation—enabling scalable personalized learning and psychometric assessment without reliance on empirical response data.

Technology Category

Application Category

📝 Abstract

Item (question) difficulties play a crucial role in educational assessments, enabling accurate and efficient assessment of student abilities and personalization to maximize learning outcomes. Traditionally, estimating item difficulties can be costly, requiring real students to respond to items, followed by fitting an item response theory (IRT) model to get item difficulty estimates. This approach cannot be applied to the cold-start setting for previously unseen items either. In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. We achieve this alignment using direct preference optimization (DPO), where we form preference pairs based on how likely responses are under a ground-truth IRT model. We perform a simulation by generating thousands of responses, evaluating them with an LLM-based scoring model, and fit the resulting data to an IRT model to obtain item difficulty estimates. Through extensive experiments on a real-world student response dataset, we show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.

Problem

Research questions and friction points this paper is trying to address.

Predicts difficulty of unseen questions in education

Reduces cost of estimating item difficulties

Aligns simulated students with IRT for accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Align simulated students with IRT using DPO

Generate responses via LLM-based scoring model

Predict item difficulties with simulated student data

🔎 Similar Papers

Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?

2024-07-07International Conference on Computational LinguisticsCitations: 0

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Authors to Follow