HiCI: Hierarchical Construction-Integration for Long-Context Attention

๐Ÿ“… 2026-03-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the inefficiency and limited effectiveness of existing long-context language models, which lack explicit modeling of local-global information structures. Inspired by discourse comprehension theories in cognitive science, we propose Hierarchical Construction-Integration (HiCI) attentionโ€”the first approach to incorporate an explicit hierarchical structure as an inductive bias into long-context modeling. HiCI first constructs segment-level representations, integrates them into a shared global context, and jointly guides local attention computation. Using parameter-efficient fine-tuning based on LLaMA-2 with less than a 5.5% increase in parameters, our method successfully extends context lengths to 100K (7B) and 64K (13B). The resulting models significantly outperform strong baselines across language modeling, retrieval, and instruction-following tasks, surpassing GPT-3.5-Turbo-16K in code understanding and matching closed-source models in topic retrieval performance.

Technology Category

Application Category

๐Ÿ“ Abstract
Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration), a hierarchical attention module that constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. We validate HiCI through parameter-efficient adaptation of LLaMA-2 with only <5.5% additional parameters, extending context from 4K to 100K tokens (7B) and 64K tokens (13B). Across language modeling, retrieval, and instruction-following benchmarks, HiCI yields consistent improvements over strong baselines, including matching proprietary models on topic retrieval and surpassing GPT-3.5-Turbo-16K on code comprehension. These results demonstrate the effectiveness of explicit hierarchical structuring as an inductive bias for long-context modeling.
Problem

Research questions and friction points this paper is trying to address.

long-context modeling
hierarchical structure
attention mechanism
information structuring
language modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical attention
long-context modeling
construction-integration
inductive bias
parameter-efficient adaptation
๐Ÿ”Ž Similar Papers
No similar papers found.