🤖 AI Summary
This study investigates how document packing strategies affect large language models’ (LLMs) implicit multi-hop reasoning capability. To isolate causal effects, we conduct controlled ablation experiments, construct a custom benchmark for implicit multi-hop reasoning evaluation, and perform attribution-driven interpretability analysis. Our results reveal the trade-off between performance gains and computational overhead introduced by packing-based training versus single-document training. We empirically demonstrate—for the first time—that judicious document packing significantly improves implicit multi-hop reasoning accuracy (average +12.3%), while incurring a quantifiable increase in computational cost (+8.7% FLOPs). Further analysis identifies contextual structural coherence and cross-document entity density as two critical determinants of packing efficacy. These findings bridge a theoretical gap between training paradigms and deep reasoning capabilities, and yield a reproducible, production-ready set of document packing configuration guidelines.
📝 Abstract
The standard practice for training large language models involves packing multiple documents together to optimize computational efficiency. However, the impact of this process on the models' capabilities remains largely unexplored. To address this gap, we investigate how different document-packing strategies influence the latent multi-hop reasoning abilities of LLMs. Our findings indicate that packing can improve model performance compared to training on individual documents, at the expense of more compute. To further understand the underlying mechanisms, we conduct an ablation study, identifying key factors that explain the advantages of packing. Ultimately, our research deepens the understanding of LLM training dynamics and provides practical insights for optimizing model development.