🤖 AI Summary
To address the scalability bottleneck of Transformer-based models in long-document summarization—stemming from the quadratic computational complexity $O(n^2)$ of self-attention—this paper proposes SpanBilinear, an efficient summarization framework that integrates bilinear sparse attention with an adaptive span mechanism. SpanBilinear dynamically determines local attention spans and employs parameter-efficient bilinear interactions to model long-range dependencies, thereby significantly reducing computational overhead while enhancing contextual representation capacity. Evaluated on CNN/DailyMail and XSum benchmarks, it achieves ROUGE-L improvements of 68.1% and 52.6%, respectively, substantially outperforming existing sparse-attention and long-range modeling approaches. These results demonstrate SpanBilinear’s dual advantages in computational efficiency and summary quality, establishing a new state-of-the-art for efficient long-document summarization.
📝 Abstract
Transformer-based architectures have advanced text summarization, yet their quadratic complexity limits scalability on long documents. This paper introduces BiSparse-AAS (Bilinear Sparse Attention with Adaptive Spans), a novel framework that combines sparse attention, adaptive spans, and bilinear attention to address these limitations. Sparse attention reduces computational costs by focusing on the most relevant parts of the input, while adaptive spans dynamically adjust the attention ranges. Bilinear attention complements both by modeling complex token interactions within this refined context. BiSparse-AAS consistently outperforms state-of-the-art baselines in both extractive and abstractive summarization tasks, achieving average ROUGE improvements of about 68.1% on CNN/DailyMail and 52.6% on XSum, while maintaining strong performance on OpenWebText and Gigaword datasets. By addressing efficiency, scalability, and long-sequence modeling, BiSparse-AAS provides a unified, practical solution for real-world text summarization applications.