A Survey on Long Text Modeling with Transformers

📅 2023-02-28

🏛️ arXiv.org

📈 Citations: 57

✨ Influential: 4

career value

205K/year

🤖 AI Summary

To address key challenges in long-text modeling—including limited context length, difficulty capturing long-range dependencies, and complex semantic structures—this paper presents a systematic survey of Transformer-based long-text processing techniques from 2018 to 2023. Methodologically, it introduces the first formal definition of “long-text modeling” to unify task conceptualization; proposes a structured taxonomy integrating context extension and long-range dependency modeling; and establishes a comprehensive classification framework encompassing sparse attention, chunking/compressive encoding, hierarchical modeling, and enhanced positional encoding. The study also synthesizes mainstream evaluation benchmarks and traces technical evolution. Results reveal scalability and efficient inference as critical bottlenecks, and identify promising future directions: model lightweighting, dynamic context allocation, and semantics-aware compression.

📝 Abstract

Modeling long texts has been an essential technique in the field of natural language processing (NLP). With the ever-growing number of long documents, it is important to develop effective modeling methods that can process and analyze such texts. However, long texts pose important research challenges for existing text models, with more complex semantics and special characteristics. In this paper, we provide an overview of the recent advances on long texts modeling based on Transformer models. Firstly, we introduce the formal definition of long text modeling. Then, as the core content, we discuss how to process long input to satisfy the length limitation and design improved Transformer architectures to effectively extend the maximum context length. Following this, we discuss how to adapt Transformer models to capture the special characteristics of long texts. Finally, we describe four typical applications involving long text modeling and conclude this paper with a discussion of future directions. Our survey intends to provide researchers with a synthesis and pointer to related work on long text modeling.

Problem

Research questions and friction points this paper is trying to address.

Addressing length limitations in Transformer models for long texts

Enhancing Transformer architectures to handle complex long-text semantics

Adapting Transformers to capture unique long-text characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process long input within length limits

Design improved Transformer architectures

Adapt models for long text characteristics

🔎 Similar Papers

No similar papers found.