🤖 AI Summary
This work addresses the fundamental challenge that neural networks struggle to learn exact arithmetic algorithms and exhibit strong length generalization. To this end, we propose the Differentiable Finite-State Transformer (DFST), a Turing-complete, differentiable model capable of constant-precision computation and constant-time output generation, with end-to-end logarithmic parallel training. Leveraging supervised learning from expert strategy trajectories, DFST learns binary and decimal addition and multiplication from minimal training data—requiring only a few dozen examples—and achieves zero test error even when extrapolating to input lengths over one thousand times longer than those seen during training. This is the first demonstration that purely gradient-based methods can provably induce strictly algorithmic behavior and support ultra-long sequence extrapolation. The framework establishes a novel paradigm for neural-symbolic integration and interpretable algorithm learning, bridging the gap between differentiable modeling and rigorous computational semantics.
📝 Abstract
We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our approach centers on Differentiable Finite-State Transducers (DFSTs), a Turing-complete model family that avoids the pitfalls of prior architectures by enabling constant-precision, constant-time generation, and end-to-end log-parallel differentiable training. Leveraging policy-trajectory observations from expert agents, we train DFSTs to perform binary and decimal addition and multiplication. Remarkably, models trained on tiny datasets generalize without error to inputs thousands of times longer than the training examples. These results show that training differentiable agents on structured intermediate supervision could pave the way towards exact gradient-based learning of algorithmic skills. Code available at href{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}.