GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models

📅 2024-09-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing NER datasets suffer from limited corpus coverage, coarse-grained entity definitions, and evaluation paradigms ill-suited for LLM-based methods, hindering robust assessment and fine-tuning. To address this, we propose the Generative Extraction and In-Context Classification (GEIC) paradigm and introduce CascadeNER—a lightweight cascaded framework enabling few-shot/zero-shot, cross-lingual, and fine-grained NER. Our key contributions are threefold: (1) the first GEIC paradigm, which decouples entity boundary detection from type classification; (2) AnythingNER—the first LLM-oriented multilingual NER benchmark, covering 8 languages, 155 fine-grained types, and a dynamic type system; and (3) a compact dual-model architecture achieving SOTA performance on low-resource benchmarks (e.g., CrossNER, FewNERD) while significantly reducing computational overhead, thereby advancing general-purpose multilingual NER.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have supplanted traditional methods in numerous natural language processing tasks. Nonetheless, in Named Entity Recognition (NER), existing LLM-based methods underperform compared to baselines and require significantly more computational resources, limiting their application. In this paper, we introduce the task of generation-based extraction and in-context classification (GEIC), designed to leverage LLMs' prior knowledge and self-attention mechanisms for NER tasks. We then propose CascadeNER, a universal and multilingual GEIC framework for few-shot and zero-shot NER. CascadeNER employs model cascading to utilize two small-parameter LLMs to extract and classify independently, reducing resource consumption while enhancing accuracy. We also introduce AnythingNER, the first NER dataset specifically designed for LLMs, including 8 languages, 155 entity types and a novel dynamic categorization system. Experiments show that CascadeNER achieves state-of-the-art performance on low-resource and fine-grained scenarios, including CrossNER and FewNERD. Our work is openly accessible.
Problem

Research questions and friction points this paper is trying to address.

Designs dataset for LLM-based NER
Addresses limitations in entity categorization
Introduces multilingual and multi-granular NER method
Innovation

Methods, ideas, or system contributions that make the work stand out.

DynamicNER dataset for LLMs
CascadeNER two-stage strategy
Multilingual, fine-grained entity types
🔎 Similar Papers
No similar papers found.