🤖 AI Summary
To address semantic discontinuity across paragraphs caused by segmentation-based preprocessing in long-document semantic segmentation, this paper proposes CrossFormer: an end-to-end, paragraph-level semantic segmentation model built upon the Transformer architecture. Its core innovation is a cross-fragment semantic fusion module that dynamically models implicit inter-paragraph semantic dependencies. Notably, CrossFormer is the first to directly integrate a learned semantic segmentation model into retrieval-augmented generation (RAG) systems—replacing heuristic, rule-based chunking. Leveraging self-attention and explicit cross-paragraph semantic alignment, it achieves global semantic modeling and is trained end-to-end under supervised learning. Experiments demonstrate state-of-the-art performance on multiple public semantic segmentation benchmarks. In RAG evaluation, CrossFormer significantly improves recall quality, response coherence, and—most notably—chunk-level semantic consistency (+23.6%).
📝 Abstract
Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.