ExpertGenQA: Open-ended QA generation in Specialized Domains

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Generating high-quality open-domain question-answer pairs for specialized domains (e.g., railway safety regulations) remains challenged by insufficient utilization of expert demonstrations and difficulty in balancing thematic diversity. This paper proposes the first generative protocol integrating few-shot learning with structured topic/style classification. We empirically uncover a systematic surface-level stylistic bias in large language model (LLM) evaluators—while demonstrating their superior fidelity in preserving cognitive complexity distributions. Our method incorporates Bloom’s taxonomy-based cognitive hierarchy analysis, domain adaptation grounded in authentic regulatory documents, and a retrieval-augmented evaluation framework. Experiments show that our approach achieves twice the generation efficiency of baseline methods and attains 94.4% topic coverage. Fine-tuning a retrieval model boosts top-1 accuracy by 13.02%, significantly enhancing downstream task performance.

Technology Category

Application Category

📝 Abstract

Generating high-quality question-answer pairs for specialized technical domains remains challenging, with existing approaches facing a tradeoff between leveraging expert examples and achieving topical diversity. We present ExpertGenQA, a protocol that combines few-shot learning with structured topic and style categorization to generate comprehensive domain-specific QA pairs. Using U.S. Federal Railroad Administration documents as a test bed, we demonstrate that ExpertGenQA achieves twice the efficiency of baseline few-shot approaches while maintaining $94.4%$ topic coverage. Through systematic evaluation, we show that current LLM-based judges and reward models exhibit strong bias toward superficial writing styles rather than content quality. Our analysis using Bloom's Taxonomy reveals that ExpertGenQA better preserves the cognitive complexity distribution of expert-written questions compared to template-based approaches. When used to train retrieval models, our generated queries improve top-1 accuracy by $13.02%$ over baseline performance, demonstrating their effectiveness for downstream applications in technical domains.

Problem

Research questions and friction points this paper is trying to address.

Generates high-quality QA pairs for specialized domains

Balances expert examples with topical diversity

Improves retrieval model accuracy in technical applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines few-shot learning with structured categorization

Achieves high topic coverage and efficiency

Improves retrieval model accuracy in technical domains

🔎 Similar Papers

No similar papers found.