A Unified Biomedical Named Entity Recognition Framework with Large Language Models

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing three key challenges in biomedical named entity recognition (BioNER)—nested entity identification, ambiguous boundary detection, and poor cross-lingual generalization—this paper proposes a generative, unified modeling framework grounded in large language models (LLMs). Methodologically, it introduces: (1) a symbolic annotation strategy that jointly encodes flat and nested entities within a single sequence labeling scheme; (2) a contrastive learning–based entity selector to enhance boundary discrimination; and (3) a bilingual joint fine-tuning mechanism to strengthen multilingual and multi-task transferability. Evaluated on four mainstream BioNER benchmarks and two zero-shot target-language corpora, the approach achieves state-of-the-art performance across all settings. Notably, it demonstrates exceptional zero-shot cross-lingual transfer capability, outperforming prior methods without language-specific adaptation. This work establishes a scalable, robust, and generative paradigm for BioNER, advancing both modeling flexibility and real-world applicability in low-resource multilingual scenarios.

Technology Category

Application Category

📝 Abstract

Accurate recognition of biomedical named entities is critical for medical information extraction and knowledge discovery. However, existing methods often struggle with nested entities, entity boundary ambiguity, and cross-lingual generalization. In this paper, we propose a unified Biomedical Named Entity Recognition (BioNER) framework based on Large Language Models (LLMs). We first reformulate BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation. To enhance multilingual and multi-task generalization, we perform bilingual joint fine-tuning across multiple Chinese and English datasets. Additionally, we introduce a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples. Experimental results on four benchmark datasets and two unseen corpora show that our method achieves state-of-the-art performance and robust zero-shot generalization across languages. The source codes are freely available at https://github.com/dreamer-tx/LLMNER.

Problem

Research questions and friction points this paper is trying to address.

Handling nested entities and boundary ambiguity in biomedical text

Improving cross-lingual generalization for biomedical entity recognition

Filtering incorrect predictions using contrastive learning strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates BioNER as text generation task

Uses bilingual joint fine-tuning for generalization

Introduces contrastive learning-based entity selector

🔎 Similar Papers

No similar papers found.

Authors to Follow