UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of large language models in long-context reasoning, which stems from attention dilution and reduced out-of-distribution generalization. Existing approaches typically rely on fixed context budgets, failing to accommodate the varying contextual demands of individual tokens. To overcome this limitation, the paper proposes UT-ACA, an adaptive context allocation framework that dynamically allocates context resources based on per-token uncertainty. UT-ACA integrates semantic embeddings and logit confidence to estimate uncertainty in real time and models its cumulative effect throughout decoding. When evidence is insufficient, the framework triggers context fallback, expansion, and token regeneration mechanisms. This approach enables on-demand context utilization for the first time, significantly reducing average context consumption while preserving generation quality and enhancing inference efficiency.

Technology Category

Application Category

📝 Abstract
Long-context inference remains challenging for large language models due to attention dilution and out-of-distribution degradation. Context selection mitigates this limitation by attending to a subset of key-value cache entries, yet most methods allocate a fixed context budget throughout decoding despite highly non-uniform token-level contextual demands. To address this issue, we propose Uncertainty-Triggered Adaptive Context Allocation (UT-ACA), an inference-time framework that dynamically adjusts the context window based on token-wise uncertainty. UT-ACA learns an uncertainty detector that combines semantic embeddings with logit-based confidence while accounting for uncertainty accumulation across decoding steps. When insufficient evidence is indicated, UT-ACA selectively rolls back, expands the context window, and regenerates the token with additional support. Experiments show that UT-ACA substantially reduces average context usage while preserving generation quality in long-context settings.
Problem

Research questions and friction points this paper is trying to address.

long-context inference
attention dilution
context selection
out-of-distribution degradation
context budget
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive context allocation
uncertainty detection
long-context inference
dynamic context window
token-level uncertainty
🔎 Similar Papers
L
Lang Zhou
Sun Yat-sen University
S
Shuxuan Li
Sun Yat-sen University
Z
Zhuohao Li
Sun Yat-sen University
S
Shi Liu
Southern University of Science and Technology
Z
Zhilin Zhao
Sun Yat-sen University
Wei-Shi Zheng
Wei-Shi Zheng
Professor @ SUN YAT-SEN UNIVERSITY
Computer VisionPattern RecognitionMachine Learning