🤖 AI Summary
Increasing Transformer depth leads to exponential growth in parameter count, while existing recurrent methods perform coarse-grained, layer-level repetition without fine-grained control over computation. Method: We propose Intra-Layer Recurrence (ILR), a fine-grained recurrence mechanism that—within a single forward pass—selectively iterates core submodules (e.g., FFN or attention) multiple times inside a single Transformer layer, enabling dynamic state reuse without adding parameters, modifying architecture, or introducing auxiliary computational graphs. ILR employs a learnable iteration scheduling policy (e.g., allocating more iterations to earlier layers) to adaptively allocate compute resources. Contribution/Results: Evaluated on standard Transformer architectures for language modeling, ILR achieves comparable or superior performance to deeper baselines using significantly fewer parameters, thereby improving the trade-off between parameter efficiency and modeling capacity.
📝 Abstract
Transformer models have established new benchmarks in natural language processing; however, their increasing depth results in substantial growth in parameter counts. While existing recurrent transformer methods address this issue by reprocessing layers multiple times, they often apply recurrence indiscriminately across entire blocks of layers. In this work, we investigate Intra-Layer Recurrence (ILR), a more targeted approach that applies recurrence selectively to individual layers within a single forward pass. Our experiments show that allocating more iterations to earlier layers yields optimal results. These findings suggest that ILR offers a promising direction for optimizing recurrent structures in transformer architectures.