Intra-Layer Recurrence in Transformers for Language Modeling

📅 2025-05-03

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Increasing Transformer depth leads to exponential growth in parameter count, while existing recurrent methods perform coarse-grained, layer-level repetition without fine-grained control over computation. Method: We propose Intra-Layer Recurrence (ILR), a fine-grained recurrence mechanism that—within a single forward pass—selectively iterates core submodules (e.g., FFN or attention) multiple times inside a single Transformer layer, enabling dynamic state reuse without adding parameters, modifying architecture, or introducing auxiliary computational graphs. ILR employs a learnable iteration scheduling policy (e.g., allocating more iterations to earlier layers) to adaptively allocate compute resources. Contribution/Results: Evaluated on standard Transformer architectures for language modeling, ILR achieves comparable or superior performance to deeper baselines using significantly fewer parameters, thereby improving the trade-off between parameter efficiency and modeling capacity.

Technology Category

Application Category

📝 Abstract

Transformer models have established new benchmarks in natural language processing; however, their increasing depth results in substantial growth in parameter counts. While existing recurrent transformer methods address this issue by reprocessing layers multiple times, they often apply recurrence indiscriminately across entire blocks of layers. In this work, we investigate Intra-Layer Recurrence (ILR), a more targeted approach that applies recurrence selectively to individual layers within a single forward pass. Our experiments show that allocating more iterations to earlier layers yields optimal results. These findings suggest that ILR offers a promising direction for optimizing recurrent structures in transformer architectures.

Problem

Research questions and friction points this paper is trying to address.

Reducing parameter growth in deep transformer models

Selectively applying recurrence within individual layers

Optimizing layer iteration allocation for better performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-Layer Recurrence selectively applied to layers

More iterations allocated to earlier layers

Optimizes recurrent structures in transformers

🔎 Similar Papers

No similar papers found.