Arithmetics-Based Decomposition of Numeral Words - Arithmetic Conditions give the Unpacking Strategy

📅 2023-12-14
🏛️ arXiv.org
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of parsing numeral expressions—particularly those exhibiting non-decimal or mixed-base structures (e.g., vigesimal systems)—which conventional decimal-centric models fail to decompose semantically. Method: We propose the first general numeral decomposition framework driven by arithmetic inequalities (e.g., 2×val(S) < val(N)), reversing Hurford’s “packing” strategy. The approach integrates arithmetic condition checking, recursive structural parsing, and reinforcement learning for end-to-end modeling. It supports arbitrary bases—including non-decimal and hybrid systems—and enables principled assessment of subnumeral substitutability. Contribution/Results: The framework is theoretically sound and cross-linguistically applicable, eliminating reliance on base-10 assumptions. Evaluated across 254 natural language numeral systems, it demonstrates robust generalization. We release open-source implementations—including language-specific decomposers and RL training pipelines—achieving significant gains in subnumeral identification accuracy and interpretability of semantic unpacking.
📝 Abstract
In this paper we present a novel numeral decomposer that is designed to revert Hurford's Packing Strategy. The Packing Strategy is a model on how numeral words are formed out of smaller numeral words by recursion. The decomposer does not simply check decimal digits but it also works for numerals formed on base 20 or any other base or even combinations of different bases. All assumptions that we use are justified with Hurford's Packing Strategy. The decomposer reads through the numeral. When it finds a sub-numeral, it checks arithmetic conditions to decide whether or not to unpack the sub-numeral. The goal is to unpack those numerals that can sensibly be substituted by similar numerals. E.g., in 'twenty-seven thousand and two hundred and six' it should unpack 'twenty-seven' and 'two hundred and six', as those could each be sensibly replaced by any numeral from 1 to 999. Our most used condition is: If S is a substitutable sub-numeral of a numeral N, then 2*value(S)<value(N). We have tested the decomposer on numeral systems in 254 different natural languages. We also developed a reinforcement learning algorithm based on the decomposer. Both algorithms' code and the results are open source on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Decompose numeral words using arithmetic conditions, not base-10.
Reverse Hurford's Packing Strategy by detecting factors and summands.
Induce grammars for numerals in 273 languages unsupervised.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Arithmetic-based numeral decomposition strategy
Unpacking factors and summands via arithmetic criteria
Incremental unsupervised grammar induction across languages
🔎 Similar Papers
No similar papers found.