D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

High-quality domain-specific question-answering (QA) data is scarce and expensive to construct, severely hindering supervised fine-tuning (SFT) of domain large language models (LLMs). To address this, we propose D-SCoRE—a training-free, prompt-engineering-based framework for automated QA-chain-of-thought (CoT) data generation. D-SCoRE integrates multiple controllable mechanisms: document-centric text segmentation, semantic role transformation, question-type balancing, and counterfactual augmentation—combined with CoT reasoning and structured output formatting. It enables efficient generation on consumer-grade hardware: an 8B-parameter model produces six four-choice counterfactual QA-CoT samples per 100–200-word input in under 90 seconds. Experiments demonstrate that models fine-tuned on D-SCoRE-generated data significantly outperform baselines on SQuADShifts and Covid-QA, achieving performance comparable to that obtained with human-annotated data.

Technology Category

Application Category

📝 Abstract

The scarcity and high cost of high-quality question-answering (QA) datasets hinder supervised fine-tuning (SFT) for domain-specific large language models (LLMs). To address this, we introduce D-SCoRE, a training-free pipeline that utilizes LLMs and prompt engineering to produce diverse, high-quality QA datasets from arbitrary textual sources. D-SCoRE integrates $ extbf{D}$ocument-centric processing, $ extbf{S}$egmentation, $ extbf{Co}$T $ extbf{R}$easoning, and structured $ extbf{E}$xport to generate QA-COT datasets tailored for domain-aware SFT. Multi-dimensional control mechanisms, such as semantic role transformation, question type balancing, and counterfactual materials, enhance diversity and relevance, overcoming limitations of existing QA generation. LLMs fine-tuned on D-SCoRE-generated QA datasets, and human-annotated QA datasets (SQuAD, Covid-QA) are evaluated on SQuADShifts and Covid-QA test sets, with D-SCoRE outperforming across most domains. D-SCoRE generates six QA-CoT pairs with four-option counterfactual materials per 100-200-word text in 90 seconds using an 8B LLM on consumer-grade hardware. Its simplicity and scalability enable efficient QA generation and high-performance fine-tuning across domains.

Problem

Research questions and friction points this paper is trying to address.

Generates QA datasets from text for domain-specific LLMs

Enhances QA diversity via semantic roles and counterfactuals

Enables efficient QA-CoT pair production on consumer hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Document-centric processing and segmentation

Chain-of-Thought reasoning for QA generation

Structured export for domain-specific datasets

🔎 Similar Papers

Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review