🤖 AI Summary
To address retrieval challenges in biomedical question answering caused by lexical diversity and semantic ambiguity of domain-specific terminology, this paper proposes an unsupervised, semantics-driven query expansion method. It integrates structured ontological knowledge—definitions and semantic relationships—from the UMLS Metathesaurus into large language models (LLMs) to enable controllable, low-hallucination query rewriting and expansion, seamlessly supporting both sparse and dense retrievers. Its key innovation lies in the first explicit incorporation of a canonical biomedical ontology into an LLM-based query expansion framework, balancing factual accuracy with generative flexibility. Evaluated on NFCorpus, TREC-COVID, and SciFact, the method achieves up to a 22.1% improvement in NDCG@10 over sparse baselines and a 6.5% gain over the strongest baseline; it also improves robustness to perturbed queries by 15.7%. Additionally, we publicly release a restructured biomedical QA benchmark dataset.
📝 Abstract
Effective Question Answering (QA) on large biomedical document collections requires effective document retrieval techniques. The latter remains a challenging task due to the domain-specific vocabulary and semantic ambiguity in user queries. We propose BMQExpander, a novel ontology-aware query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness. We implemented several state-of-the-art baselines, including sparse and dense retrievers, query expansion methods, and biomedical-specific solutions. We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks: NFCorpus, TREC-COVID, and SciFact - with improvements of up to 22.1% in NDCG@10 over sparse baselines and up to 6.5% over the strongest baseline. Further, BMQExpander generalizes robustly under query perturbation settings, in contrast to supervised baselines, achieving up to 15.7% improvement over the strongest baseline. As a side contribution, we publish our paraphrased benchmarks. Finally, our qualitative analysis shows that BMQExpander has fewer hallucinations compared to other LLM-based query expansion baselines.