Exact Learning of Arithmetic with Differentiable Agents

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental challenge that neural networks struggle to learn exact arithmetic algorithms and exhibit strong length generalization. To this end, we propose the Differentiable Finite-State Transformer (DFST), a Turing-complete, differentiable model capable of constant-precision computation and constant-time output generation, with end-to-end logarithmic parallel training. Leveraging supervised learning from expert strategy trajectories, DFST learns binary and decimal addition and multiplication from minimal training data—requiring only a few dozen examples—and achieves zero test error even when extrapolating to input lengths over one thousand times longer than those seen during training. This is the first demonstration that purely gradient-based methods can provably induce strictly algorithmic behavior and support ultra-long sequence extrapolation. The framework establishes a novel paradigm for neural-symbolic integration and interpretable algorithm learning, bridging the gap between differentiable modeling and rigorous computational semantics.

Technology Category

Application Category

📝 Abstract
We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our approach centers on Differentiable Finite-State Transducers (DFSTs), a Turing-complete model family that avoids the pitfalls of prior architectures by enabling constant-precision, constant-time generation, and end-to-end log-parallel differentiable training. Leveraging policy-trajectory observations from expert agents, we train DFSTs to perform binary and decimal addition and multiplication. Remarkably, models trained on tiny datasets generalize without error to inputs thousands of times longer than the training examples. These results show that training differentiable agents on structured intermediate supervision could pave the way towards exact gradient-based learning of algorithmic skills. Code available at href{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}.
Problem

Research questions and friction points this paper is trying to address.

Exact algorithmic learning with gradient-based methods for arithmetic tasks
Strong length generalization from tiny training datasets to much longer inputs
Differentiable framework enabling constant-precision and constant-time generation for arithmetic operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable Finite-State Transducers enable exact algorithmic learning
Models generalize to inputs thousands of times longer than training data
Training uses structured intermediate supervision from expert agents
🔎 Similar Papers
No similar papers found.