KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Multi-hop question answering suffers from spurious correlations due to sparse training data. Existing question generation methods emphasize syntactic diversity but neglect explicit modeling of critical knowledge—such as supporting sentence combinations—in the source document. To address this, we propose a Knowledge Composition Sampling (KCS) framework that decouples multi-hop question generation into two stages: sentence-level knowledge selection and question formulation. Knowledge selection is formalized as a conditional prediction task, optimized via a probabilistic contrastive loss and augmented with stochastic decoding to jointly enhance accuracy and diversity. On HotpotQA and 2WikiMultihopQA, KCS improves knowledge composition selection accuracy by 3.9%; when used for data augmentation, it yields significant gains in downstream QA performance. This work pioneers the treatment of knowledge composition as a learnable, sampleable, structured process—establishing a novel paradigm for constructing high-quality multi-hop reasoning data.

Technology Category

Application Category

📝 Abstract

Multi-hop question answering faces substantial challenges due to data sparsity, which increases the likelihood of language models learning spurious patterns. To address this issue, prior research has focused on diversifying question generation through content planning and varied expression. However, these approaches often emphasize generating simple questions and neglect the integration of essential knowledge, such as relevant sentences within documents. This paper introduces the Knowledge Composition Sampling (KCS), an innovative framework designed to expand the diversity of generated multi-hop questions by sampling varied knowledge compositions within a given context. KCS models the knowledge composition selection as a sentence-level conditional prediction task and utilizes a probabilistic contrastive loss to predict the next most relevant piece of knowledge. During inference, we employ a stochastic decoding strategy to effectively balance accuracy and diversity. Compared to competitive baselines, our KCS improves the overall accuracy of knowledge composition selection by 3.9%, and its application for data augmentation yields improvements on HotpotQA and 2WikiMultihopQA datasets. Our code is available at: https://github.com/yangfanww/kcs.

Problem

Research questions and friction points this paper is trying to address.

Addresses data sparsity in multi-hop question answering

Diversifies question generation through knowledge composition sampling

Improves accuracy and diversity in knowledge integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Composition Sampling for question diversity

Probabilistic contrastive loss for knowledge prediction

Stochastic decoding balances accuracy and diversity

🔎 Similar Papers

No similar papers found.