🤖 AI Summary
This study investigates whether large language models can learn deterministic tree sequences generated by iterative prime factorization of natural numbers—each integer maps to a unique planar rooted tree, forming a strictly arithmetic, structured sequence. Method: We introduce the first autoregressive tree language encoding this purely arithmetic structure and train a GPT-2–based Transformer end-to-end on the first 10¹¹ sequence elements. Contribution/Results: The model successfully captures not only basic syntactic constraints but also discovers and predicts nontrivial long-range structural dependencies and regular patterns. This work transcends conventional language modeling paradigms by demonstrating, for the first time, that Transformers can effectively learn noise-free, fully deterministic arithmetic structures. It establishes a novel benchmark for probing the abstract reasoning capabilities and learnability limits of large models, offering a principled framework to assess their capacity for structured mathematical generalization.
📝 Abstract
We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a rooted planar tree and the resulting sequence $ mathbb{N}mathcal{T}$ defines an arithmetic text with measurable statistical structure. A transformer network (the GPT-2 architecture) is trained from scratch on the first $10^{11}$ elements to subsequently test its predictive ability under next-word and masked-word prediction tasks. Our results show that the model partially learns the internal grammar of $mathbb{N}mathcal{T}$, capturing non-trivial regularities and correlations. This suggests that learnability may extend beyond empirical data to the very structure of arithmetic.