KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-hop question answering suffers from spurious correlations due to sparse training data. Existing question generation methods emphasize syntactic diversity but neglect explicit modeling of critical knowledge—such as supporting sentence combinations—in the source document. To address this, we propose a Knowledge Composition Sampling (KCS) framework that decouples multi-hop question generation into two stages: sentence-level knowledge selection and question formulation. Knowledge selection is formalized as a conditional prediction task, optimized via a probabilistic contrastive loss and augmented with stochastic decoding to jointly enhance accuracy and diversity. On HotpotQA and 2WikiMultihopQA, KCS improves knowledge composition selection accuracy by 3.9%; when used for data augmentation, it yields significant gains in downstream QA performance. This work pioneers the treatment of knowledge composition as a learnable, sampleable, structured process—establishing a novel paradigm for constructing high-quality multi-hop reasoning data.

Technology Category

Application Category

📝 Abstract
Multi-hop question answering faces substantial challenges due to data sparsity, which increases the likelihood of language models learning spurious patterns. To address this issue, prior research has focused on diversifying question generation through content planning and varied expression. However, these approaches often emphasize generating simple questions and neglect the integration of essential knowledge, such as relevant sentences within documents. This paper introduces the Knowledge Composition Sampling (KCS), an innovative framework designed to expand the diversity of generated multi-hop questions by sampling varied knowledge compositions within a given context. KCS models the knowledge composition selection as a sentence-level conditional prediction task and utilizes a probabilistic contrastive loss to predict the next most relevant piece of knowledge. During inference, we employ a stochastic decoding strategy to effectively balance accuracy and diversity. Compared to competitive baselines, our KCS improves the overall accuracy of knowledge composition selection by 3.9%, and its application for data augmentation yields improvements on HotpotQA and 2WikiMultihopQA datasets. Our code is available at: https://github.com/yangfanww/kcs.
Problem

Research questions and friction points this paper is trying to address.

Addresses data sparsity in multi-hop question answering
Diversifies question generation through knowledge composition sampling
Improves accuracy and diversity in knowledge integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Composition Sampling for question diversity
Probabilistic contrastive loss for knowledge prediction
Stochastic decoding balances accuracy and diversity
🔎 Similar Papers
No similar papers found.
Y
Yangfan Wang
Harbin Institute of Technology
J
Jie Liu
Harbin Institute of Technology, National Key Laboratory of Smart Farm Technologies and Systems
C
Chen Tang
MemTensor (Shanghai) Technology Co., Ltd.
Lian Yan
Lian Yan
Harbin Institute of Technology
Large Language ModelDialogue System for Medical Diagnosis
Jingchi Jiang
Jingchi Jiang
Harbin Institute of Technology
Knowledge GraphMachine LearningData Mining