A Survey on Long Text Modeling with Transformers

📅 2023-02-28
🏛️ arXiv.org
📈 Citations: 57
Influential: 4
📄 PDF
🤖 AI Summary
To address key challenges in long-text modeling—including limited context length, difficulty capturing long-range dependencies, and complex semantic structures—this paper presents a systematic survey of Transformer-based long-text processing techniques from 2018 to 2023. Methodologically, it introduces the first formal definition of “long-text modeling” to unify task conceptualization; proposes a structured taxonomy integrating context extension and long-range dependency modeling; and establishes a comprehensive classification framework encompassing sparse attention, chunking/compressive encoding, hierarchical modeling, and enhanced positional encoding. The study also synthesizes mainstream evaluation benchmarks and traces technical evolution. Results reveal scalability and efficient inference as critical bottlenecks, and identify promising future directions: model lightweighting, dynamic context allocation, and semantics-aware compression.
📝 Abstract
Modeling long texts has been an essential technique in the field of natural language processing (NLP). With the ever-growing number of long documents, it is important to develop effective modeling methods that can process and analyze such texts. However, long texts pose important research challenges for existing text models, with more complex semantics and special characteristics. In this paper, we provide an overview of the recent advances on long texts modeling based on Transformer models. Firstly, we introduce the formal definition of long text modeling. Then, as the core content, we discuss how to process long input to satisfy the length limitation and design improved Transformer architectures to effectively extend the maximum context length. Following this, we discuss how to adapt Transformer models to capture the special characteristics of long texts. Finally, we describe four typical applications involving long text modeling and conclude this paper with a discussion of future directions. Our survey intends to provide researchers with a synthesis and pointer to related work on long text modeling.
Problem

Research questions and friction points this paper is trying to address.

Addressing length limitations in Transformer models for long texts
Enhancing Transformer architectures to handle complex long-text semantics
Adapting Transformers to capture unique long-text characteristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Process long input within length limits
Design improved Transformer architectures
Adapt models for long text characteristics
🔎 Similar Papers
No similar papers found.