Difficulty-Controllable Cloze Question Distractor Generation

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing cloze distractor generation methods lack controllable difficulty levels and high-quality annotated data. To address this, we introduce the first distractor dataset with fine-grained difficulty annotations and propose a bidirectional generation–ensemble QA joint annotation framework, enabling fully automated, multi-level difficulty classification of distractors for the first time. Furthermore, we design a multi-task sequence-to-sequence model that jointly models semantic matching and difficulty prediction, incorporating distractor filtering and data augmentation to enhance robustness. Experiments demonstrate that our method generates significantly higher-quality distractors across multiple difficulty levels than GPT-4o, achieving state-of-the-art alignment with human-perceived difficulty. This work establishes a novel, interpretable, and controllable paradigm for personalized language proficiency assessment.

Technology Category

Application Category

📝 Abstract
Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.
Problem

Research questions and friction points this paper is trying to address.

Generating distractors with controllable difficulty for cloze questions
Overcoming the lack of difficulty-annotated datasets through data augmentation
Enhancing semantic understanding and difficulty estimation via multitask learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-annotated dataset creation via two-way generation
Multitask learning framework with auxiliary semantic tasks
Difficulty-controllable distractor generation outperforming GPT-4o
🔎 Similar Papers
No similar papers found.
S
Seokhoon Kang
Graduate School of Artificial Intelligence, POSTECH, South Korea
Yejin Jeon
Yejin Jeon
POSTECH
Speech SynthesisSignal ProcessingNatural Language Processing
S
Seonjeong Hwang
Graduate School of Artificial Intelligence, POSTECH, South Korea
G
Gary Geunbae Lee
Graduate School of Artificial Intelligence, POSTECH, South Korea; Department of Computer Science and Engineering, POSTECH, South Korea