Savaal: Scalable Concept-Driven Question Generation to Enhance Human Learning

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating high-quality, reasoning-oriented questions—rather than fact-recalling ones—from long documents (e.g., hundred-page doctoral dissertations) remains a significant challenge. Method: This paper proposes the first three-stage, concept-driven question generation framework: (1) knowledge-graph-guided concept extraction; (2) hierarchical concept relation modeling; and (3) constrained prompting with reflective rewriting. Unlike conventional QA generation methods limited to shallow memorization, our approach enables cross-disciplinary, scalable deep question generation. Contribution/Results: Evaluated by 76 domain experts, our method achieves 6.5× and 1.5× higher reasoning depth on doctoral dissertations and academic papers, respectively, compared to baseline models. Moreover, question quality improves with document length, while per-question reasoning cost decreases. This work presents the first systematic validation of conceptual question generation’s efficacy and practicality on high-depth academic texts.

Technology Category

Application Category

📝 Abstract
Assessing and enhancing human learning through question-answering is vital, yet automating this process remains challenging. While large language models (LLMs) excel at summarization and query responses, their ability to generate meaningful questions for learners is underexplored. We propose Savaal, a scalable question-generation system with three objectives: (i) scalability, enabling question generation from hundreds of pages of text (ii) depth of understanding, producing questions beyond factual recall to test conceptual reasoning, and (iii) domain-independence, automatically generating questions across diverse knowledge areas. Instead of providing an LLM with large documents as context, Savaal improves results with a three-stage processing pipeline. Our evaluation with 76 human experts on 71 papers and PhD dissertations shows that Savaal generates questions that better test depth of understanding by 6.5X for dissertations and 1.5X for papers compared to a direct-prompting LLM baseline. Notably, as document length increases, Savaal's advantages in higher question quality and lower cost become more pronounced.
Problem

Research questions and friction points this paper is trying to address.

Automating meaningful question generation for learners
Enhancing depth of understanding through conceptual questions
Ensuring scalability and domain-independence in question generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable question-generation system
Three-stage processing pipeline
Domain-independent question generation
🔎 Similar Papers
No similar papers found.
Kimia Noorbakhsh
Kimia Noorbakhsh
MIT
Language ModelsStatistical InferenceML for Networks
J
Joseph Chandler
M.I.T. Computer Science and Artificial Intelligence Lab (CSAIL)
P
Pantea Karimi
M.I.T. Computer Science and Artificial Intelligence Lab (CSAIL)
M
Mohammad Alizadeh
M.I.T. Computer Science and Artificial Intelligence Lab (CSAIL)
H
Harinarayanan Balakrishnan
M.I.T. Computer Science and Artificial Intelligence Lab (CSAIL)