Investigating Recurrent Transformers with Dynamic Halt

📅 2024-02-01

📈 Citations: 1

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This paper addresses the limited recursive capability of Transformers in modeling long sequences. It systematically investigates two enhancement paradigms: depth-wise recursion (e.g., Universal Transformer) and chunked temporal recursion (e.g., Temporal Latent Bottleneck). The authors propose two key innovations: (1) a dynamic halting mechanism based on global mean activation, enabling adaptive computation steps; and (2) the first integration of depth-wise recursion into a chunked temporal framework, yielding a cross-paradigm fused architecture. Extensive evaluation on the Long-Range Arena (LRA) benchmark—including ListOps, Flip-Flop, and Logical Inference—demonstrates significant differences in the effectiveness of various recursive inductive biases. Results confirm that the dynamic halting mechanism jointly improves both modeling accuracy and computational efficiency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

In this paper, we comprehensively study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism: (1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck. Furthermore, we propose and investigate novel ways to extend and combine the above methods - for example, we propose a global mean-based dynamic halting mechanism for Universal Transformers and an augmentation of Temporal Latent Bottleneck with elements from Universal Transformer. We compare the models and probe their inductive biases in several diagnostic tasks, such as Long Range Arena (LRA), flip-flop language modeling, ListOps, and Logical Inference. The code is released in: https://github.com/JRC1995/InvestigatingRecurrentTransformers/tree/main

Problem

Research questions and friction points this paper is trying to address.

Studying inductive biases in recurrent-augmented Transformer models

Proposing dynamic halting mechanisms for Universal Transformers

Evaluating models on diagnostic tasks like LRA and ListOps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Depth-wise recurrence similar to Universal Transformers

Chunk-wise temporal recurrence like Temporal Latent Bottleneck

Global mean-based dynamic halting mechanism for Universal Transformers

🔎 Similar Papers

No similar papers found.