Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the limited length generalization capability of Transformer models when processing inputs longer than those seen during pretraining. To mitigate the out-of-distribution issue of positional encodings on long sequences, the authors propose Random Float Sampling (RFS), a novel positional encoding strategy that abandons conventional discrete position indices and instead introduces continuous, randomly sampled positional offsets during training. RFS is designed to be seamlessly integrated with various mainstream positional encoding methods—including sinusoidal encoding, RoPE, and ALiBi—and demonstrates substantial improvements in both length extrapolation tasks and zero-shot commonsense reasoning benchmarks. The approach exhibits strong extrapolation ability and broad compatibility across different architectures, highlighting its effectiveness and generality in enhancing model robustness to sequence length variations.

Technology Category

Application Category

📝 Abstract

Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks.

Problem

Research questions and friction points this paper is trying to address.

length generalization

position encoding

out-of-distribution

Transformer

sequence modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random Float Sampling

Position Encoding

Length Generalization