Structured Packing in LLM Training Improves Long Context Utilization

📅 2023-12-28
🏛️ arXiv.org
📈 Citations: 12
Influential: 1
📄 PDF
🤖 AI Summary
To address insufficient context utilization, the prominent “lost-in-the-middle” phenomenon, and weak long-text comprehension in large language models (LLMs) with extended contexts, this paper proposes SPLiCe: a retrieval-based structured data packing framework that constructs semantically coherent long training sequences via cross-document semantic collaboration. SPLiCe requires no architectural modifications to the base model and achieves significant performance gains on long-context question answering benchmarks—including Qasper and HotpotQA—through lightweight fine-tuning alone, yielding consistent improvements across 3B, 7B, and 13B model scales. Notably, this work uncovers, for the first time, an unexpected positive transfer effect from code pretraining to natural-language long-context tasks. Collectively, SPLiCe establishes a scalable, low-overhead paradigm for efficient long-context modeling, advancing both practical applicability and theoretical understanding of context-aware LLMs.
📝 Abstract
Recent advancements in long-context large language models have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. This study investigates structuring training data to enhance semantic interdependence, demonstrating that this approach effectively improves context utilization. To this end, we introduce the Structured Packing for Long Context (SPLiCe) method, which utilizes retrieval to collate mutually relevant documents into long and coherent training examples. We validate SPLiCe empirically across models of varying sizes -- 3B, 7B, and 13B -- achieving improved performance in long-context tasks, such as Qasper and HotpotQA. Remarkably, even brief fine-tuning with SPLiCe is sufficient to realize these benefits. Additionally, SPLiCe effectively mitigates the lost-in-middle phenomenon often observed in large models. Our comprehensive analysis of SPLiCe explores its design choices and reveals intriguing transfer effects; for instance, training on programming code enhances performance on natural language tasks.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Long Text Understanding
Contextual Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

SPLiCe
Long Text Understanding
Structured Filling Method
🔎 Similar Papers
No similar papers found.
Konrad Staniszewski
Konrad Staniszewski
University of Warsaw, NVIDIA
Machine LearningReinforcement LearningAlgorithmsLanguage Models
Szymon Tworkowski
Szymon Tworkowski
University of Warsaw, Ideas NCBR
Y
Yu Zhao
University of Edinburgh
Sebastian Jaszczur
Sebastian Jaszczur
Anthropic (past: IDEAS, University of Warsaw)
machine learning
H
H. Michalewski
Google DeepMind
L
Lukasz Kuci'nski
University of Warsaw, Ideas NCBR, Institute of Mathematics, Polish Academy of Sciences
P
Piotr Milo's
University of Warsaw, Ideas NCBR, Institute of Mathematics, Polish Academy of Sciences, deepsense.ai