Assessment of Generative Named Entity Recognition in the Era of Large Language Models

📅 2026-01-25

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses the challenge of adapting traditional sequence-labeling-based named entity recognition (NER) to the generative paradigm of large language models (LLMs), and presents the first systematic evaluation of open-source LLMs on both flat and nested NER tasks. Through experiments on standard benchmarks using parameter-efficient fine-tuning (e.g., LoRA), structured output formats (inline brackets, XML), and models across multiple scales, the work demonstrates that open-source LLMs can achieve performance comparable to—or even surpassing—that of conventional encoder-based models and GPT-3 when prompted with structured formats. The results indicate that the NER capability of these models stems from their instruction-following and generalization abilities rather than memorization of entity–label pairs, and that fine-tuning has minimal adverse impact on their general capabilities—sometimes even enhancing them.

Technology Category

Application Category

📝 Abstract

Named entity recognition (NER) is evolving from a sequence labeling task into a generative paradigm with the rise of large language models (LLMs). We conduct a systematic evaluation of open-source LLMs on both flat and nested NER tasks. We investigate several research questions including the performance gap between generative NER and traditional NER models, the impact of output formats, whether LLMs rely on memorization, and the preservation of general capabilities after fine-tuning. Through experiments across eight LLMs of varying scales and four standard NER datasets, we find that: (1) With parameter-efficient fine-tuning and structured formats like inline bracketed or XML, open-source LLMs achieve performance competitive with traditional encoder-based models and surpass closed-source LLMs like GPT-3; (2) The NER capability of LLMs stems from instruction-following and generative power, not mere memorization of entity-label pairs; and (3) Applying NER instruction tuning has minimal impact on general capabilities of LLMs, even improving performance on datasets like DROP due to enhanced entity understanding. These findings demonstrate that generative NER with LLMs is a promising, user-friendly alternative to traditional methods. We release the data and code at https://github.com/szu-tera/LLMs4NER.

Problem

Research questions and friction points this paper is trying to address.

Named Entity Recognition

Large Language Models

Generative NER

Fine-tuning

Model Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative NER

Large Language Models

Parameter-Efficient Fine-Tuning