Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

๐Ÿ“… 2026-03-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of hallucinations in large language modelโ€“generated content within the biomedical domain by introducing Med-V1, a lightweight language model family with only 3 billion parameters. By constructing high-quality synthetic data and unifying five biomedical verification tasks into an evidence attribution format for zero-shot training, Med-V1 outperforms existing baselines by 27.0%โ€“71.3% across multiple benchmarks, approaching the performance of GPT-5. This study represents the first demonstration of a small-scale model achieving both high-fidelity interpretability and scalable deployment for biomedical claim verification. Furthermore, it pioneers two novel applications: quantifying hallucination rates under varying citation instructions and automatically detecting high-risk evidence misuse in clinical guidelines.

Technology Category

Application Category

๐Ÿ“ Abstract
Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achieving strong performance requires frontier models such as GPT-5 that are prohibitively expensive to deploy at scale. To efficiently perform biomedical evidence attribution, we present Med-V1, a family of small language models with only three billion parameters. Trained on high-quality synthetic data newly developed in this study, Med-V1 substantially outperforms (+27.0% to +71.3%) its base models on five biomedical benchmarks unified into a verification format. Despite its smaller size, Med-V1 performs comparably to frontier LLMs such as GPT-5, along with high-quality explanations for its predictions. We use Med-V1 to conduct a first-of-its-kind use case study that quantifies hallucinations in LLM-generated answers under different citation instructions. Results show that the format instruction strongly affects citation validity and hallucination, with GPT-5 generating more claims but exhibiting hallucination rates similar to GPT-4o. Additionally, we present a second use case showing that Med-V1 can automatically identify high-stakes evidence misattributions in clinical practice guidelines, revealing potentially negative public health impacts that are otherwise challenging to identify at scale. Overall, Med-V1 provides an efficient and accurate lightweight alternative to frontier LLMs for practical and real-world applications in biomedical evidence attribution and verification tasks. Med-V1 is available at https://github.com/ncbi-nlp/Med-V1.
Problem

Research questions and friction points this paper is trying to address.

biomedical evidence attribution
hallucination detection
claim verification
zero-shot
scalable
Innovation

Methods, ideas, or system contributions that make the work stand out.

small language models
biomedical evidence attribution
zero-shot verification
hallucination detection
synthetic training data
๐Ÿ”Ž Similar Papers
No similar papers found.
Q
Qiao Jin
Division of Intramural Research, National Library of Medicine, National Institutes of Health
Yin Fang
Yin Fang
National Institutes of Health
AI4BioinformaticsKnowledge GraphLanguage Model
L
Lauren He
Division of Intramural Research, National Library of Medicine, National Institutes of Health
Yifan Yang
Yifan Yang
NCBI, NLM, NIH | University of Maryland, College Park
Guangzhi Xiong
Guangzhi Xiong
University of Virginia
Zhizheng Wang
Zhizheng Wang
Postdoc, Division of Intramural Research (DIR), NLM, NIH
Large Language ModelsRepresentation LearningGraph Data MiningBioinformatics
N
Nicholas Wan
Division of Intramural Research, National Library of Medicine, National Institutes of Health
J
Joey Chan
Division of Intramural Research, National Library of Medicine, National Institutes of Health
D
Donald C. Comeau
Division of Intramural Research, National Library of Medicine, National Institutes of Health
Robert Leaman
Robert Leaman
Staff Scientist, NCBI/NLM/NIH
Natural Language ProcessingMachine Learning
C
Charalampos S. Floudas
Center for Cancer Research, National Cancer Institute, National Institutes of Health
A
Aidong Zhang
Department of Computer Science, University of Virginia
Michael F. Chiang
Michael F. Chiang
Director, National Eye Institute, National Institutes of Health
Artificial IntelligenceRetinopathy of PrematurityBiomedical InformaticsData SciencePediatric Ophthalmology
Yifan Peng
Yifan Peng
Associate Professor at Weill Cornell Medicine
NLPCVmachine learning
Zhiyong Lu
Zhiyong Lu
Senior Investigator, NLM; Adjunct Professor of CS, UIUC
BioNLPBiomedical InformaticsMedical AIArtificial Intelligence