Length Generalization with Log-Depth Recurrent Units

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural networks struggle with sequence length generalization: recurrent models are prone to positional bias, while Transformers are constrained by fixed computational depth. This work proposes MLP-LDRU (Multilayer Perceptron-based Logarithmic-Depth Recurrent Unit), which introduces a logarithmic-depth architecture into recurrent modeling for the first time. By leveraging a parallel reduction mechanism, MLP-LDRU efficiently simulates associative recursive operations. The approach substantially enhances length extrapolation performance, achieving 100% out-of-distribution accuracy on 18 out of 21 regular language tasks and exceeding 99.9% on the remaining three. It also demonstrates strong results on ListOps and natural language classification benchmarks, surpassing the limitations of conventional recurrent and attention-based models.
📝 Abstract
Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as label prediction can be checked for any sequence length. We propose MLP-LDRU, a type of Log-Depth Recurrent Unit, which captures a class of associativity-biased operators designed to approximate recurrence through parallel reduction. We evaluate MLP-LDRU on 21 regular-language tasks, consisting of standard benchmarks and new prefix languages, where it achieves 100% out-of-distribution accuracy on 18 tasks and at least 99.9% on the remaining 3 when increasing max training length, outperforming comparable recurrent and attention-based models. We further evaluate MLP-LDRU beyond regular languages on ListOps and NLP classification benchmarks, where it performs competitively.
Problem

Research questions and friction points this paper is trying to address.

length generalization
recurrent models
positional biases
computational depth
regular languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Length Generalization
Log-Depth Recurrent Unit
Parallel Reduction
Associativity-Biased Operators
Out-of-Distribution Generalization
🔎 Similar Papers