FarsiMCQGen: a Persian Multiple-choice Question Generation Framework

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address the challenge of generating high-quality multiple-choice questions (MCQs) for low-resource languages like Persian, this paper introduces the first end-to-end Persian MCQ generation framework. The method integrates knowledge graph–enhanced distractor generation, rule-guided candidate filtering, and Transformer/language model–driven question ranking. Leveraging Persian Wikipedia, we construct and publicly release the first large-scale Persian MCQ dataset (10,289 items). Experimental results demonstrate that our generated questions significantly outperform baselines across multiple automatic and human evaluation metrics; further validation using mainstream large language models confirms their strong discriminative power and controllable difficulty. This work fills a critical gap in Persian educational assessment NLP research and provides both a reproducible methodology and benchmark resources for intelligent item generation in low-resource languages.

Technology Category

Application Category

📝 Abstract

Multiple-choice questions (MCQs) are commonly used in educational testing, as they offer an efficient means of evaluating learners' knowledge. However, generating high-quality MCQs, particularly in low-resource languages such as Persian, remains a significant challenge. This paper introduces FarsiMCQGen, an innovative approach for generating Persian-language MCQs. Our methodology combines candidate generation, filtering, and ranking techniques to build a model that generates answer choices resembling those in real MCQs. We leverage advanced methods, including Transformers and knowledge graphs, integrated with rule-based approaches to craft credible distractors that challenge test-takers. Our work is based on data from Wikipedia, which includes general knowledge questions. Furthermore, this study introduces a novel Persian MCQ dataset comprising 10,289 questions. This dataset is evaluated by different state-of-the-art large language models (LLMs). Our results demonstrate the effectiveness of our model and the quality of the generated dataset, which has the potential to inspire further research on MCQs.

Problem

Research questions and friction points this paper is trying to address.

Generating Persian multiple-choice questions for low-resource languages

Creating credible distractors using transformers and knowledge graphs

Evaluating a novel Persian MCQ dataset with large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines candidate generation, filtering, and ranking

Integrates Transformers with knowledge graphs and rules

Generates credible distractors for Persian MCQs

🔎 Similar Papers

No similar papers found.