๐ค AI Summary
This work addresses the inefficiency and limited effectiveness of existing long-context language models, which lack explicit modeling of local-global information structures. Inspired by discourse comprehension theories in cognitive science, we propose Hierarchical Construction-Integration (HiCI) attentionโthe first approach to incorporate an explicit hierarchical structure as an inductive bias into long-context modeling. HiCI first constructs segment-level representations, integrates them into a shared global context, and jointly guides local attention computation. Using parameter-efficient fine-tuning based on LLaMA-2 with less than a 5.5% increase in parameters, our method successfully extends context lengths to 100K (7B) and 64K (13B). The resulting models significantly outperform strong baselines across language modeling, retrieval, and instruction-following tasks, surpassing GPT-3.5-Turbo-16K in code understanding and matching closed-source models in topic retrieval performance.
๐ Abstract
Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration), a hierarchical attention module that constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. We validate HiCI through parameter-efficient adaptation of LLaMA-2 with only <5.5% additional parameters, extending context from 4K to 100K tokens (7B) and 64K tokens (13B). Across language modeling, retrieval, and instruction-following benchmarks, HiCI yields consistent improvements over strong baselines, including matching proprietary models on topic retrieval and surpassing GPT-3.5-Turbo-16K on code comprehension. These results demonstrate the effectiveness of explicit hierarchical structuring as an inductive bias for long-context modeling.