🤖 AI Summary
This study addresses the scarcity of computational resources for converting written numerals into spoken forms in Tigrinya, a gap that has hindered its application in language modeling and speech synthesis. The work presents the first formalization of oral expression rules for both cardinal and ordinal numbers in Tigrinya and proposes a rule-based natural language generation algorithm capable of handling numerals in common contexts such as dates, times, and currency. The implemented open-source module demonstrates significantly higher accuracy than mainstream large language models on this task, effectively filling a critical void in Tigrinya computational linguistics. This contribution provides a reusable foundational tool that supports future research and practical applications in Tigrinya natural language processing and speech technologies.
📝 Abstract
We present a systematic formalization of Tigrinya cardinal and ordinal number verbalization, addressing a gap in computational resources for the language. This work documents the canonical rules governing the expression of numerical values in spoken Tigrinya, including the conjunction system, scale words, and special cases for dates, times, and currency. We provide a formal algorithm for number-to-word conversion and release an open-source implementation. Evaluation of frontier large language models (LLMs) reveals significant gaps in their ability to accurately verbalize Tigrinya numbers, underscoring the need for explicit rule documentation. This work serves language modeling, speech synthesis, and accessibility applications targeting Tigrinya-speaking communities.