Augmenting biomedical named entity recognition with general-domain resources

📅 2024-06-15

🏛️ Journal of Biomedical Informatics

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Biomedical named entity recognition (NER) suffers from scarce, costly, and noisy annotations, leading to poor model generalization. To address this, we propose a few-shot NER method leveraging domain-agnostic pretraining corpora—specifically Wikipedia and BooksCorpus—to enhance weakly supervised learning. Our approach integrates multi-stage transfer learning, curriculum-driven domain-adaptive knowledge distillation, entity-type-aware prompt tuning, and consistency regularization. This enables effective knowledge transfer and robust pseudo-labeling without requiring large-scale high-quality annotations. Evaluated on standard benchmarks—including BC5CDR and JNLPBA—our method achieves absolute F1 improvements of 3.2–5.8% over strong baselines. Notably, using only 10% of the labeled data, it surpasses fully supervised models trained on the complete annotated corpus. The framework significantly reduces dependency on expert-curated labels while delivering scalability and robustness, establishing a new paradigm for low-resource biomedical NER.