Length Generalization with Log-Depth Recurrent Units

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Neural networks struggle with sequence length generalization: recurrent models are prone to positional bias, while Transformers are constrained by fixed computational depth. This work proposes MLP-LDRU (Multilayer Perceptron-based Logarithmic-Depth Recurrent Unit), which introduces a logarithmic-depth architecture into recurrent modeling for the first time. By leveraging a parallel reduction mechanism, MLP-LDRU efficiently simulates associative recursive operations. The approach substantially enhances length extrapolation performance, achieving 100% out-of-distribution accuracy on 18 out of 21 regular language tasks and exceeding 99.9% on the remaining three. It also demonstrates strong results on ListOps and natural language classification benchmarks, surpassing the limitations of conventional recurrent and attention-based models.

📝 Abstract

Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as label prediction can be checked for any sequence length. We propose MLP-LDRU, a type of Log-Depth Recurrent Unit, which captures a class of associativity-biased operators designed to approximate recurrence through parallel reduction. We evaluate MLP-LDRU on 21 regular-language tasks, consisting of standard benchmarks and new prefix languages, where it achieves 100% out-of-distribution accuracy on 18 tasks and at least 99.9% on the remaining 3 when increasing max training length, outperforming comparable recurrent and attention-based models. We further evaluate MLP-LDRU beyond regular languages on ListOps and NLP classification benchmarks, where it performs competitively.

Problem

Research questions and friction points this paper is trying to address.

length generalization

recurrent models

positional biases

computational depth

regular languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Length Generalization

Log-Depth Recurrent Unit

Parallel Reduction