Which Are the Low-Resource Languages of the Semantic Web?

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This study addresses the lack of a clear definition for low-resource languages in the Semantic Web, which exacerbates inequalities in multilingual open data. To bridge this gap, the work proposes a reproducible, multi-tiered classification framework that quantifies language resources by integrating data from multiple sources—DBpedia, BabelNet, and Wikidata—within the context of Linked Open Data knowledge graphs. The framework evaluates languages along three dimensions: language coverage, number of entities, and cross-lingual transfer potential, categorizing them into low-, medium-, and high-resource tiers. This systematic approach not only fills a critical void in assessing language resource availability in the Semantic Web but also provides empirical guidance for selecting target languages in cross-lingual knowledge transfer tasks.
📝 Abstract
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from the global digital transformation. Multilingual Linked Open Data Knowledge Graphs (LOD KGs) could contribute to mitigating this divide through cross-lingual transfer; however, no clear quantitative definition of low-resource languages has yet been established in the context of LOD KGs. In this poster, we present a methodology to analyze the distribution of languages across LOD KGs and propose a preliminary multi-level categorization based on DBpedia, BabelNet, and Wikidata. This categorization is leveraged to bring a formal definition of low-, high-, and medium-resource languages that could be later leveraged to select cross-lingual transfer candidates.
Problem

Research questions and friction points this paper is trying to address.

low-resource languages
Linked Open Data
Semantic Web
language categorization
Open Access Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

low-resource languages
Linked Open Data
knowledge graphs
cross-lingual transfer
multilingual categorization