AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of fixed computational budgets in autoregressive sequence modeling—where models struggle to dynamically adapt inference cost to varying test-time requirements—this paper introduces AbbIE, an encoder-only blockwise autoregressive iterative encoder. AbbIE recursively refines latent representations of input blocks via iterative encoding; it is trained with only two iterations but supports arbitrary iteration counts at inference time, without requiring additional data, fine-tuning, or specialized training protocols. This design enables fine-grained, on-the-fly trade-offs between computational cost and modeling performance. Evaluated at the 350M parameter scale, AbbIE achieves a 5% reduction in language modeling perplexity and a 12% improvement in zero-shot in-context learning accuracy. By decoupling training efficiency from inference flexibility, AbbIE establishes a novel paradigm for efficient scaling of Transformer-based architectures.

Technology Category

Application Category

📝 Abstract
We introduce the Autoregressive Block-Based Iterative Encoder (AbbIE), a novel recursive generalization of the encoder-only Transformer architecture, which achieves better perplexity than a standard Transformer and allows for the dynamic scaling of compute resources at test time. This simple, recursive approach is a complement to scaling large language model (LLM) performance through parameter and token counts. AbbIE performs its iterations in latent space, but unlike latent reasoning models, does not require a specialized dataset or training protocol. We show that AbbIE upward generalizes (ability to generalize to arbitrary iteration lengths) at test time by only using 2 iterations during train time, far outperforming alternative iterative methods. AbbIE's ability to scale its computational expenditure based on the complexity of the task gives it an up to extbf{12%} improvement in zero-shot in-context learning tasks versus other iterative and standard methods and up to 5% improvement in language perplexity. The results from this study open a new avenue to Transformer performance scaling. We perform all of our evaluations on model sizes up to 350M parameters.
Problem

Research questions and friction points this paper is trying to address.

Improves sequence modeling efficiency with autoregressive block-based encoding
Enables dynamic compute scaling for varying task complexity
Enhances zero-shot learning and perplexity without specialized training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive block-based iterative encoder architecture
Dynamic compute scaling at test time
Latent space iterations without specialized training
🔎 Similar Papers
No similar papers found.