Can LLMs Design Good Questions Based on Context?

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study investigates the capabilities and human–machine disparities of large language models (LLMs) in text-driven question generation. To this end, we propose the first systematic, multi-dimensional quantitative evaluation framework, automatically assessing LLM-generated questions along six dimensions: question length, question-type distribution, context coverage, answerability, linguistic fluency, and semantic fidelity. We further introduce a novel LLM-based self-evaluation mechanism—requiring no human annotation—that enables end-to-end quality assessment of generated questions. Experimental results reveal a fundamental trade-off in LLM-generated questions between contextual coverage breadth and answer precision—a pattern markedly distinct from human question authoring behavior. Our framework establishes a transferable analytical paradigm for question quality assessment, offering actionable insights for optimizing downstream applications such as question-answering systems and intelligent educational assessment tools.

Technology Category

Application Category

📝 Abstract

This paper evaluates questions generated by LLMs from context, comparing them to human-generated questions across six dimensions. We introduce an automated LLM-based evaluation method, focusing on aspects like question length, type, context coverage, and answerability. Our findings highlight unique characteristics of LLM-generated questions, contributing insights that can support further research in question quality and downstream applications.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Question Generation

Human-vs-AI Comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Automatic Question Generation

Comprehensive Evaluation

🔎 Similar Papers

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions