How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction?

📅 2024-02-21
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of domain specificity on biomedical relation extraction performance, addressing two key questions: (1) whether biomedical pretraining outperforms general-purpose pretraining, and (2) whether biomedical instruction tuning surpasses general instruction tuning or pretraining alone. We systematically evaluate open-source generative models—including LLaMA and BioMedLM—across four standard benchmarks, employing instruction tuning, zero-/few-shot inference, and cross-dataset evaluation. Results demonstrate that general-purpose large language models, after minimal biomedical instruction tuning (just 1% of biomedical instruction data), match or exceed fully biomedical-pretrained models; moreover, general models consistently outperform domain-specific pretrained ones. This work challenges the prevailing “domain pretraining is inherently superior” paradigm and introduces, for the first time, large-scale biomedical instruction tuning as a cost-effective alternative to expensive domain-specific pretraining. The approach achieves state-of-the-art, robust performance across multiple biomedical relation extraction tasks.

Technology Category

Application Category

📝 Abstract
Cutting edge techniques developed in the general NLP domain are often subsequently applied to the high-value, data-rich biomedical domain. The past few years have seen generative language models (LMs), instruction finetuning, and few-shot learning become foci of NLP research. As such, generative LMs pretrained on biomedical corpora have proliferated and biomedical instruction finetuning has been attempted as well, all with the hope that domain specificity improves performance on downstream tasks. Given the nontrivial effort in training such models, we investigate what, if any, benefits they have in the key biomedical NLP task of relation extraction. Specifically, we address two questions: (1) Do LMs trained on biomedical corpora outperform those trained on general domain corpora? (2) Do models instruction finetuned on biomedical datasets outperform those finetuned on assorted datasets or those simply pretrained? We tackle these questions using existing LMs, testing across four datasets. In a surprising result, general-domain models typically outperformed biomedical-domain models. However, biomedical instruction finetuning improved performance to a similar degree as general instruction finetuning, despite having orders of magnitude fewer instructions. Our findings suggest it may be more fruitful to focus research effort on larger-scale biomedical instruction finetuning of general LMs over building domain-specific biomedical LMs
Problem

Research questions and friction points this paper is trying to address.

Evaluating domain-specific vs general LMs for biomedical relation extraction
Assessing impact of biomedical instruction finetuning on model performance
Comparing biomedical-domain and general-domain pretraining effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Biomedical instruction finetuning enhances general LMs
General-domain LMs outperform biomedical-domain LMs
Focus on large-scale biomedical instruction finetuning
🔎 Similar Papers
No similar papers found.