Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

📅 2024-07-13
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Systematic NLP research for non-English languages—such as Greek—suffers from insufficient scholarly attention and a lack of standardized criteria for evaluating language resources (LRs). Method: This study establishes the first reproducible, transferable monolingual NLP survey paradigm (2012–2023), introducing a structured literature retrieval protocol, a task-level classification taxonomy, and a novel dual-dimension evaluation framework that jointly assesses task coverage and multidimensional LR attributes (e.g., licensing, documentation, format, update frequency). Contribution/Results: We present a decade-long panoramic analysis of Greek NLP, identifying 52 core LRs and mapping support across 18 task categories; key gaps include missing benchmarks for low-resource tasks and ambiguous licensing for 50% of LRs. The proposed methodology provides a standardized, scalable template for surveys of other low-resource languages, thereby advancing systematicity and sustainability in monolingual NLP research.

Technology Category

Application Category

📝 Abstract
Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources, necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in the literature. This study introduces a generalizable methodology for creating systematic and comprehensive monolingual NLP surveys, aimed at optimizing the process of constructing such surveys and thoroughly addressing a language's NLP support. Our approach integrates a structured search protocol to avoid selection bias and ensure reproducibility, an NLP task taxonomy to organize the surveyed material coherently, and language resources (LRs) taxonomies to identify potential benchmarks and highlight opportunities for improving resource availability (e.g., through better maintenance or licensing). We apply this methodology to Greek NLP (2012-2023), providing a comprehensive overview of its current state and challenges. We discuss the progress of Greek NLP and outline the Greek LRs found, classified by availability and usability, assessing language support per NLP task. The presented systematic literature review of Greek NLP serves as an application of our method that showcases the benefits of monolingual NLP surveys more broadly. Similar applications could be considered for the myriads of languages whose progress in NLP lags behind that of well-supported languages.
Problem

Research questions and friction points this paper is trying to address.

Natural Language Processing
Non-English Languages
Greek Language
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monolingual NLP
Greek Language
Systematic Approach
🔎 Similar Papers
No similar papers found.