Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data: the PyMUSAS framework for Multilingual Semantic Annotation

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of existing semantic annotation systems, which have been hindered by the lack of large-scale multilingual evaluation benchmarks and high-quality human-annotated data within the USAS framework. We present the first large-scale multilingual semantic annotation evaluation across five languages under USAS, introducing a hybrid approach that integrates rule-based systems with neural architectures such as Transformers. Leveraging silver-standard English annotations, we train both monolingual and multilingual models to enable cross-lingual transfer. Our contributions include the release of the first USAS silver-standard English training set and a Chinese evaluation dataset. Experimental results demonstrate that neural models consistently outperform purely rule-based systems, with the hybrid method yielding further improvements. All models, datasets, and code are publicly released to advance research in multilingual semantic annotation.

Technology Category

Application Category

📝 Abstract
Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code have been released as open resources.
Problem

Research questions and friction points this paper is trying to address.

Semantic Tagging
Word Sense Disambiguation
Multilingual Evaluation
Silver Standard Data
USAS Framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid semantic tagger
silver standard data
multilingual semantic annotation
neural-enhanced rule system
PyMUSAS
🔎 Similar Papers
No similar papers found.
A
Andrew Moore
UCREL, Lancaster University, UK
P
Paul Rayson
UCREL, Lancaster University, UK
Dawn Archer
Dawn Archer
Manchester Metropolitan University
pragmaticscorpus linguisticsforensic linguisticsdeception and its detection(crisis) negotiation
T
Tim Czerniak
Centre for Language and Communication Studies, Trinity College, Dublin, Ireland
Dawn Knight
Dawn Knight
Professor in Applied Linguistics, Cardiff University, UK
Corpus LinguisticsPragmaticsDiscourse AnalysisMultimodalityE-Language
D
Daisy Monika Lal
UCREL, Lancaster University, UK
G
G. '. Donnchadha
Independent Researcher
M
M'iche'al 'O Meachair
Fiontar & Scoil na Gaeilge, Dublin City University, Ireland
S
S. Piao
UCREL, Lancaster University, UK
E
Elaine U'i Dhonnchadha
Centre for Language and Communication Studies, Trinity College, Dublin, Ireland
J
Johanna Vuorinen
Independent Researcher
Y
Yan Yabo
Hubei University, China
X
Xiaobin Yang
Hubei University, China