NameTag 3: A Tool and a Service for Multilingual/Multitagset NER

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
To address the challenges of multilingual, multi-dataset, and multi-label-schema named entity recognition (NER)—including both flat and nested entities—this paper proposes a unified NER framework. Methodologically, it leverages Transformer-based fine-tuning with multi-task joint training and cross-lingual transfer learning. Its key contributions include: (i) the first single large model (355M parameters) supporting flat NER across 17 languages and enabling cross-corpus/multi-tagset training; (ii) the first lightweight domain-specific model for Czech nested NER; and (iii) an open-source CLI tool and containerized cloud service (RESTful API), eliminating local deployment and supporting 15 languages. Evaluated on 21 benchmark datasets spanning 15 languages, the framework achieves state-of-the-art performance—outperforming many larger models—and has been integrated into the LINDAT platform, serving thousands of requests daily.

Technology Category

Application Category

📝 Abstract
We introduce NameTag 3, an open-source tool and cloud-based web service for multilingual, multidataset, and multitagset named entity recognition (NER), supporting both flat and nested entities. NameTag 3 achieves state-of-the-art results on 21 test datasets in 15 languages and remains competitive on the rest, even against larger models. It is available as a command-line tool and as a cloud-based service, enabling use without local installation. NameTag 3 web service currently provides flat NER for 17 languages, trained on 21 corpora and three NE tagsets, all powered by a single 355M-parameter fine-tuned model; and nested NER for Czech, powered by a 126M fine-tuned model. The source code is licensed under open-source MPL 2.0, while the models are distributed under non-commercial CC BY-NC-SA 4.0. Documentation is available at https://ufal.mff.cuni.cz/nametag, source code at https://github.com/ufal/nametag3, and trained models via https://lindat.cz. The REST service and the web application can be found at https://lindat.mff.cuni.cz/services/nametag/. A demonstration video is available at https://www.youtube.com/watch?v=-gaGnP0IV8A.
Problem

Research questions and friction points this paper is trying to address.

Multilingual named entity recognition with multiple tagsets
Support for both flat and nested entity recognition
State-of-the-art performance across 15 languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual NER tool with 15 languages support
Single 355M-parameter model for flat NER
Cloud-based service enabling no local installation
🔎 Similar Papers
No similar papers found.