đ¤ AI Summary
This paper addresses the challenge of parsing numeral expressionsâparticularly those exhibiting non-decimal or mixed-base structures (e.g., vigesimal systems)âwhich conventional decimal-centric models fail to decompose semantically.
Method: We propose the first general numeral decomposition framework driven by arithmetic inequalities (e.g., 2Ăval(S) < val(N)), reversing Hurfordâs âpackingâ strategy. The approach integrates arithmetic condition checking, recursive structural parsing, and reinforcement learning for end-to-end modeling. It supports arbitrary basesâincluding non-decimal and hybrid systemsâand enables principled assessment of subnumeral substitutability.
Contribution/Results: The framework is theoretically sound and cross-linguistically applicable, eliminating reliance on base-10 assumptions. Evaluated across 254 natural language numeral systems, it demonstrates robust generalization. We release open-source implementationsâincluding language-specific decomposers and RL training pipelinesâachieving significant gains in subnumeral identification accuracy and interpretability of semantic unpacking.
đ Abstract
In this paper we present a novel numeral decomposer that is designed to revert Hurford's Packing Strategy. The Packing Strategy is a model on how numeral words are formed out of smaller numeral words by recursion. The decomposer does not simply check decimal digits but it also works for numerals formed on base 20 or any other base or even combinations of different bases. All assumptions that we use are justified with Hurford's Packing Strategy. The decomposer reads through the numeral. When it finds a sub-numeral, it checks arithmetic conditions to decide whether or not to unpack the sub-numeral. The goal is to unpack those numerals that can sensibly be substituted by similar numerals. E.g., in 'twenty-seven thousand and two hundred and six' it should unpack 'twenty-seven' and 'two hundred and six', as those could each be sensibly replaced by any numeral from 1 to 999. Our most used condition is: If S is a substitutable sub-numeral of a numeral N, then 2*value(S)<value(N). We have tested the decomposer on numeral systems in 254 different natural languages. We also developed a reinforcement learning algorithm based on the decomposer. Both algorithms' code and the results are open source on GitHub.