Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of hallucination in large language models when applied to biomedical generative retrieval by proposing a retrieval-augmented generation (RAG)-based, traceable generation framework. The approach was evaluated in the TREC 2025 BioGen track—the first benchmark dedicated to biomedical generative retrieval—employing novel evaluation metrics that emphasize evidential support and factual consistency to systematically assess alignment between generated content and authoritative scientific literature. Preliminary results reveal substantial deficiencies in the factual reliability of current models, underscoring the critical need for high-confidence, traceable biomedical generation systems. This study thus offers a promising pathway toward trustworthy AI applications in high-stakes clinical decision-making contexts.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedical literature, and clinical note summarization. These models have demonstrated strong capabilities in processing and synthesizing complex biomedical information and in generating fluent, human-like responses. Despite these advancements, hallucinations or confabulations remain key challenges when using LLMs in biomedical and other high-stakes domains. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research. Studies on the evaluation of the LLMs' abilities to ground generated statements in verifiable sources have shown that models perform significantly

Problem

Research questions and friction points this paper is trying to address.

hallucination

biomedical

large language models

factuality

high-stakes domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative retrieval

biomedical LLMs

hallucination mitigation