Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the challenges in distractor generation—namely, reliance on expert knowledge and difficulty in modeling human-like implicit reasoning—by proposing the first unsupervised framework that integrates chain-of-thought reasoning with in-context learning. Leveraging large language models, the method employs unsupervised semantic retrieval to select exemplars and uses chain-of-thought prompting to jointly generate distractors along with their underlying reasoning, all without requiring model fine-tuning. The resulting distractors align closely with human cognitive logic. Evaluated across six benchmark datasets spanning diverse domains and text lengths, the approach significantly outperforms existing methods, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Distractor generation (DG) remains a labor-intensive task that still significantly depends on domain experts. The task focuses on generating plausible yet incorrect options, known as distractors, for multiple-choice questions. A reliable distractor must be contextually relevant to the question and able to mislead examinees through implicit reasoning when identifying the correct answer. While a recent method integrates fine-tuning pre-trained encoder-decoder models with contrastive learning to generate semantically relevant distractors for a given question-answer, it often fails to capture the underlying reasoning process that experts utilize when selecting distractors in benchmarks. In this paper, we explore large language models (LLMs) reasoning for DG through in-context learning with unsupervised semantic retrieval for selecting few-shot examples. We design a rationale-augmented DG framework that jointly generates distractors and their rationales for a given question-answer. Extensive experiments on six benchmarks, with varying average distractor lengths and domains, demonstrate that prompting LLMs with few-shot examples substantially improves the performance compared to recent DG models. It outperforms recent approaches and achieves state-of-the-art results in generating reasoned distractors that align with human-labeled benchmarks.
Problem

Research questions and friction points this paper is trying to address.

distractor generation
reasoned distractors
multiple-choice questions
contextual relevance
implicit reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
chain-of-thought
distractor generation
large language models
rationale-augmented generation