NameTag 3: A Tool and a Service for Multilingual/Multitagset NER

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of multilingual, multi-dataset, and multi-label-schema named entity recognition (NER)—including both flat and nested entities—this paper proposes a unified NER framework. Methodologically, it leverages Transformer-based fine-tuning with multi-task joint training and cross-lingual transfer learning. Its key contributions include: (i) the first single large model (355M parameters) supporting flat NER across 17 languages and enabling cross-corpus/multi-tagset training; (ii) the first lightweight domain-specific model for Czech nested NER; and (iii) an open-source CLI tool and containerized cloud service (RESTful API), eliminating local deployment and supporting 15 languages. Evaluated on 21 benchmark datasets spanning 15 languages, the framework achieves state-of-the-art performance—outperforming many larger models—and has been integrated into the LINDAT platform, serving thousands of requests daily.

Technology Category

Application Category

📝 Abstract
We introduce NameTag 3, an open-source tool and cloud-based web service for multilingual, multidataset, and multitagset named entity recognition (NER), supporting both flat and nested entities. NameTag 3 achieves state-of-the-art results on 21 test datasets in 15 languages and remains competitive on the rest, even against larger models. It is available as a command-line tool and as a cloud-based service, enabling use without local installation. NameTag 3 web service currently provides flat NER for 17 languages, trained on 21 corpora and three NE tagsets, all powered by a single 355M-parameter fine-tuned model; and nested NER for Czech, powered by a 126M fine-tuned model. The source code is licensed under open-source MPL 2.0, while the models are distributed under non-commercial CC BY-NC-SA 4.0. Documentation is available at https://ufal.mff.cuni.cz/nametag, source code at https://github.com/ufal/nametag3, and trained models via https://lindat.cz. The REST service and the web application can be found at https://lindat.mff.cuni.cz/services/nametag/. A demonstration video is available at https://www.youtube.com/watch?v=-gaGnP0IV8A.
Problem

Research questions and friction points this paper is trying to address.

Multilingual named entity recognition with multiple tagsets
Support for both flat and nested entity recognition
State-of-the-art performance across 15 languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual NER tool with 15 languages support
Single 355M-parameter model for flat NER
Cloud-based service enabling no local installation
🔎 Similar Papers
No similar papers found.
J
Jana Strakov'a
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics
Milan Straka
Milan Straka
Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic
Natural language processingneural networks