HiCI: Hierarchical Construction-Integration for Long-Context Attention

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the inefficiency and limited effectiveness of existing long-context language models, which lack explicit modeling of local-global information structures. Inspired by discourse comprehension theories in cognitive science, we propose Hierarchical Construction-Integration (HiCI) attention—the first approach to incorporate an explicit hierarchical structure as an inductive bias into long-context modeling. HiCI first constructs segment-level representations, integrates them into a shared global context, and jointly guides local attention computation. Using parameter-efficient fine-tuning based on LLaMA-2 with less than a 5.5% increase in parameters, our method successfully extends context lengths to 100K (7B) and 64K (13B). The resulting models significantly outperform strong baselines across language modeling, retrieval, and instruction-following tasks, surpassing GPT-3.5-Turbo-16K in code understanding and matching closed-source models in topic retrieval performance.

Technology Category

Application Category

📝 Abstract

Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration), a hierarchical attention module that constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. We validate HiCI through parameter-efficient adaptation of LLaMA-2 with only <5.5% additional parameters, extending context from 4K to 100K tokens (7B) and 64K tokens (13B). Across language modeling, retrieval, and instruction-following benchmarks, HiCI yields consistent improvements over strong baselines, including matching proprietary models on topic retrieval and surpassing GPT-3.5-Turbo-16K on code comprehension. These results demonstrate the effectiveness of explicit hierarchical structuring as an inductive bias for long-context modeling.

Problem

Research questions and friction points this paper is trying to address.

long-context modeling

hierarchical structure

attention mechanism

information structuring

language modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical attention

long-context modeling

construction-integration