Traceable Cross-Source RAG for Chinese Tibetan Medicine Question Answering

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the challenge in Tibetan medicine knowledge-intensive question answering, where traditional retrieval-augmented generation (RAG) systems are dominated by dense encyclopedia content and struggle to effectively integrate sparse yet authoritative evidence from heterogeneous sources—such as classical texts and clinical literature—leading to poor traceability and heightened hallucination risks. To mitigate source density bias, the authors propose DAKS, a budget-aware routing mechanism, coupled with an alignment graph–guided evidence fusion strategy that enables coverage-aware cross-source verification instead of naive concatenation. Evaluated on a 500-query benchmark using the lightweight openPangu-Embedded-7B model, the approach significantly improves routing quality and CrossEv@5 while maintaining high answer faithfulness and citation accuracy, demonstrating its effectiveness in complex domain-specific QA.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) promises grounded question answering, yet domain settings with multiple heterogeneous knowledge bases (KBs) remain challenging. In Chinese Tibetan medicine, encyclopedia entries are often dense and easy to match, which can dominate retrieval even when classics or clinical papers provide more authoritative evidence. We study a practical setting with three KBs (encyclopedia, classics, and clinical papers) and a 500-query benchmark (cutoff $K{=}5$) covering both single-KB and cross-KB questions. We propose two complementary methods to improve traceability, reduce hallucinations, and enable cross-KB verification. First, DAKS performs KB routing and budgeted retrieval to mitigate density-driven bias and to prioritize authoritative sources when appropriate. Second, we use an alignment graph to guide evidence fusion and coverage-aware packing, improving cross-KB evidence coverage without relying on naive concatenation. All answers are generated by a lightweight generator, \textsc{openPangu-Embedded-7B}. Experiments show consistent gains in routing quality and cross-KB evidence coverage, with the full system achieving the best CrossEv@5 while maintaining strong faithfulness and citation correctness.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-augmented generation

Chinese Tibetan medicine

heterogeneous knowledge bases

cross-source question answering

traceability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Cross-Source RAG

Knowledge Base Routing