🤖 AI Summary
Whether neural networks spontaneously develop symbol-like abstract numerical variables during digital tasks remains an open question. Method: We trained Transformer and RNN models using the next-token prediction (NTP) paradigm and employed causal interventions, representation visualization, and dynamic alignment quantification to systematically analyze the emergence conditions, evolutionary trajectories, and performance correlations of numerical variables. Contribution/Results: We provide the first empirical evidence of a fundamental divergence in numerical representation between Transformers and RNNs. Numerical variables exhibit a graded (non-binary) symbolic continuum, with symbolic strength strongly correlating with task performance (r > 0.95). Their emergence depends critically on architectural design, data distribution, and training phase. Crucially, our findings demonstrate that purely statistical learning can yield implicit symbolic structures endowed with interchangeability and variability—offering key evidence bridging neural computation and symbolic reasoning.
📝 Abstract
What types of numeric representations emerge in Neural Networks (NNs)? To what degree do NNs induce abstract, mutable, slot-like numeric variables, and in what situations do these representations emerge? How do these representations change over learning, and how can we understand the neural implementations in ways that are unified across different NNs? In this work, we approach these questions by first training sequence based neural systems using Next Token Prediction (NTP) objectives on numeric tasks. We then seek to understand the neural solutions through the lens of causal abstractions or symbolic algorithms. We use a combination of causal interventions and visualization methods to find that artificial neural models do indeed develop analogs of interchangeable, mutable, latent number variables purely from the NTP objective. We then ask how variations on the tasks and model architectures affect the models' learned solutions to find that these symbol-like numeric representations do not form for every variant of the task, and transformers solve the problem in a notably different way than their recurrent counterparts. We then show how the symbol-like variables change over the course of training to find a strong correlation between the models' task performance and the alignment of their symbol-like representations. Lastly, we show that in all cases, some degree of gradience exists in these neural symbols, highlighting the difficulty of finding simple, interpretable symbolic stories of how neural networks perform numeric tasks. Taken together, our results are consistent with the view that neural networks can approximate interpretable symbolic programs of number cognition, but the particular program they approximate and the extent to which they approximate it can vary widely, depending on the network architecture, training data, extent of training, and network size.