🤖 AI Summary
This study addresses the challenge of adapting traditional sequence-labeling-based named entity recognition (NER) to the generative paradigm of large language models (LLMs), and presents the first systematic evaluation of open-source LLMs on both flat and nested NER tasks. Through experiments on standard benchmarks using parameter-efficient fine-tuning (e.g., LoRA), structured output formats (inline brackets, XML), and models across multiple scales, the work demonstrates that open-source LLMs can achieve performance comparable to—or even surpassing—that of conventional encoder-based models and GPT-3 when prompted with structured formats. The results indicate that the NER capability of these models stems from their instruction-following and generalization abilities rather than memorization of entity–label pairs, and that fine-tuning has minimal adverse impact on their general capabilities—sometimes even enhancing them.
📝 Abstract
Named entity recognition (NER) is evolving from a sequence labeling task into a generative paradigm with the rise of large language models (LLMs). We conduct a systematic evaluation of open-source LLMs on both flat and nested NER tasks. We investigate several research questions including the performance gap between generative NER and traditional NER models, the impact of output formats, whether LLMs rely on memorization, and the preservation of general capabilities after fine-tuning. Through experiments across eight LLMs of varying scales and four standard NER datasets, we find that: (1) With parameter-efficient fine-tuning and structured formats like inline bracketed or XML, open-source LLMs achieve performance competitive with traditional encoder-based models and surpass closed-source LLMs like GPT-3; (2) The NER capability of LLMs stems from instruction-following and generative power, not mere memorization of entity-label pairs; and (3) Applying NER instruction tuning has minimal impact on general capabilities of LLMs, even improving performance on datasets like DROP due to enhanced entity understanding. These findings demonstrate that generative NER with LLMs is a promising, user-friendly alternative to traditional methods. We release the data and code at https://github.com/szu-tera/LLMs4NER.