Can LLMs Design Good Questions Based on Context?

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capabilities and human–machine disparities of large language models (LLMs) in text-driven question generation. To this end, we propose the first systematic, multi-dimensional quantitative evaluation framework, automatically assessing LLM-generated questions along six dimensions: question length, question-type distribution, context coverage, answerability, linguistic fluency, and semantic fidelity. We further introduce a novel LLM-based self-evaluation mechanism—requiring no human annotation—that enables end-to-end quality assessment of generated questions. Experimental results reveal a fundamental trade-off in LLM-generated questions between contextual coverage breadth and answer precision—a pattern markedly distinct from human question authoring behavior. Our framework establishes a transferable analytical paradigm for question quality assessment, offering actionable insights for optimizing downstream applications such as question-answering systems and intelligent educational assessment tools.

Technology Category

Application Category

📝 Abstract
This paper evaluates questions generated by LLMs from context, comparing them to human-generated questions across six dimensions. We introduce an automated LLM-based evaluation method, focusing on aspects like question length, type, context coverage, and answerability. Our findings highlight unique characteristics of LLM-generated questions, contributing insights that can support further research in question quality and downstream applications.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Question Generation
Human-vs-AI Comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Automatic Question Generation
Comprehensive Evaluation
🔎 Similar Papers
No similar papers found.