ExpertGenQA: Open-ended QA generation in Specialized Domains

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating high-quality open-domain question-answer pairs for specialized domains (e.g., railway safety regulations) remains challenged by insufficient utilization of expert demonstrations and difficulty in balancing thematic diversity. This paper proposes the first generative protocol integrating few-shot learning with structured topic/style classification. We empirically uncover a systematic surface-level stylistic bias in large language model (LLM) evaluators—while demonstrating their superior fidelity in preserving cognitive complexity distributions. Our method incorporates Bloom’s taxonomy-based cognitive hierarchy analysis, domain adaptation grounded in authentic regulatory documents, and a retrieval-augmented evaluation framework. Experiments show that our approach achieves twice the generation efficiency of baseline methods and attains 94.4% topic coverage. Fine-tuning a retrieval model boosts top-1 accuracy by 13.02%, significantly enhancing downstream task performance.

Technology Category

Application Category

📝 Abstract
Generating high-quality question-answer pairs for specialized technical domains remains challenging, with existing approaches facing a tradeoff between leveraging expert examples and achieving topical diversity. We present ExpertGenQA, a protocol that combines few-shot learning with structured topic and style categorization to generate comprehensive domain-specific QA pairs. Using U.S. Federal Railroad Administration documents as a test bed, we demonstrate that ExpertGenQA achieves twice the efficiency of baseline few-shot approaches while maintaining $94.4%$ topic coverage. Through systematic evaluation, we show that current LLM-based judges and reward models exhibit strong bias toward superficial writing styles rather than content quality. Our analysis using Bloom's Taxonomy reveals that ExpertGenQA better preserves the cognitive complexity distribution of expert-written questions compared to template-based approaches. When used to train retrieval models, our generated queries improve top-1 accuracy by $13.02%$ over baseline performance, demonstrating their effectiveness for downstream applications in technical domains.
Problem

Research questions and friction points this paper is trying to address.

Generates high-quality QA pairs for specialized domains
Balances expert examples with topical diversity
Improves retrieval model accuracy in technical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines few-shot learning with structured categorization
Achieves high topic coverage and efficiency
Improves retrieval model accuracy in technical domains
🔎 Similar Papers
No similar papers found.