How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

📅 2024-07-11

🏛️ International Conference on Computational Linguistics

📈 Citations: 7

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Despite theoretical support for infinite context, state space models (SSMs), linear RNNs, and other sequence architectures exhibit substantially degraded performance on ultra-long sequences in practice, with large inter-architectural disparities in extrapolation capability. Method: We conduct the first systematic empirical evaluation of SSMs, linear RNNs, and Transformer variants across controlled synthetic tasks and real-world long-text benchmarks, analyzing their context scaling behavior and generalization curves. Results: All models suffer sharp performance drops beyond certain sequence lengths, indicating a fundamental gap between theoretical infinite-context capacity and empirical efficacy. Crucially, inductive bias—not parameter count or training scale—emerges as the dominant factor governing practical long-range modeling effectiveness. Our findings challenge prevailing assumptions about asymptotic context scalability and provide attributable, evidence-based insights into the failure mechanisms of long-range dependency modeling.

Technology Category

Application Category

📝 Abstract

Long sequences occur in abundance within real-world scenarios, hence properly modelling them opens numerous down-stream use-cases. Deep neural networks, however, have often struggled with these for a variety of reasons. Recent advances, both in system engineering as well as model design, have enabled the scaling up of model that are purported to support extended context length. In particular, the state-space and linear recurrent neural network families of models hypothetically can entend to infinite sequence lenth. However, is this too good to be true? We conduct an evaluation to show that while such claims may be sound theoretically, there remain large practical gaps that are empirically observed. In particular, recurrent models still suffer in the same settings as long-context LLMs with attention. We further show that different inductive biases have inconsistent extrapolation capabilities, highlighting the need to further study such paradigms and investigate why long-context models seemingly fail to behave as one might expect.

Problem

Research questions and friction points this paper is trying to address.

Evaluating long-sequence modeling capabilities of deep neural networks

Comparing architectural biases in long-context model performance

Investigating practical gaps in infinite-length sequence claims

Innovation

Methods, ideas, or system contributions that make the work stand out.

State-space models for infinite sequences

Linear recurrent neural networks scaling

Evaluating inductive biases empirically

🔎 Similar Papers

No similar papers found.