Difficulty-Controllable Cloze Question Distractor Generation

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing cloze distractor generation methods lack controllable difficulty levels and high-quality annotated data. To address this, we introduce the first distractor dataset with fine-grained difficulty annotations and propose a bidirectional generation–ensemble QA joint annotation framework, enabling fully automated, multi-level difficulty classification of distractors for the first time. Furthermore, we design a multi-task sequence-to-sequence model that jointly models semantic matching and difficulty prediction, incorporating distractor filtering and data augmentation to enhance robustness. Experiments demonstrate that our method generates significantly higher-quality distractors across multiple difficulty levels than GPT-4o, achieving state-of-the-art alignment with human-perceived difficulty. This work establishes a novel, interpretable, and controllable paradigm for personalized language proficiency assessment.

Technology Category

Application Category

📝 Abstract

Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.

Problem

Research questions and friction points this paper is trying to address.

Generating distractors with controllable difficulty for cloze questions

Overcoming the lack of difficulty-annotated datasets through data augmentation

Enhancing semantic understanding and difficulty estimation via multitask learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-annotated dataset creation via two-way generation

Multitask learning framework with auxiliary semantic tasks

Difficulty-controllable distractor generation outperforming GPT-4o

🔎 Similar Papers

No similar papers found.