Diversifying Question Generation over Knowledge Base via External Natural Questions

📅 2023-09-23
🏛️ International Conference on Language Resources and Evaluation
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient semantic diversity in Knowledge Base Question Generation (KBQG). We identify that conventional n-gram uniqueness–based diversity metrics conflate lexical repetition with genuine semantic diversity. To resolve this, we propose a relevance-constrained top-k question set diversity metric and design a dual-model collaborative framework: a primary generation model ensures knowledge base alignment, while an auxiliary model retrieves and injects external natural questions to enhance expressive diversity. Additionally, we introduce a semantic similarity–driven diversity optimization mechanism. Evaluated on mainstream KBQG benchmarks, our approach significantly outperforms PLM-based baselines and text-davinci-003 in diversity metrics, while achieving question quality comparable to ChatGPT. To our knowledge, this is the first method to jointly improve both semantic diversity and relevance—demonstrating a principled trade-off between expressiveness and fidelity in KBQG.
📝 Abstract
Previous methods on knowledge base question generation (KBQG) primarily focus on refining the quality of a single generated question. However, considering the remarkable paraphrasing ability of humans, we believe that diverse texts can express identical semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the aforementioned diversity. They calculate the ratio of unique n-grams in the generated question, which tends to measure duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG show that our approach can outperform pre-trained language model baselines and text-davinci-003 in diversity while achieving comparable performance with ChatGPT.
Problem

Research questions and friction points this paper is trying to address.

Enhancing diversity in knowledge base question generation
Developing a new metric to evaluate question diversity
Integrating external natural questions to improve diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

New diversity metric for question generation
Dual model framework for diverse questions
Integration of external natural questions
Shasha Guo
Shasha Guo
Renmin University of China
Natural Language ProcessingLarge Language Model
J
Jing Zhang
School of Information, Renmin University of China, Beijing, China
X
Xirui Ke
School of Information, Renmin University of China, Beijing, China
Cuiping Li
Cuiping Li
Renmin University of China
Databasebig data analysis and mining
H
Hong Chen
School of Information, Renmin University of China, Beijing, China