UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the performance degradation of large language models in long-context reasoning, which stems from attention dilution and reduced out-of-distribution generalization. Existing approaches typically rely on fixed context budgets, failing to accommodate the varying contextual demands of individual tokens. To overcome this limitation, the paper proposes UT-ACA, an adaptive context allocation framework that dynamically allocates context resources based on per-token uncertainty. UT-ACA integrates semantic embeddings and logit confidence to estimate uncertainty in real time and models its cumulative effect throughout decoding. When evidence is insufficient, the framework triggers context fallback, expansion, and token regeneration mechanisms. This approach enables on-demand context utilization for the first time, significantly reducing average context consumption while preserving generation quality and enhancing inference efficiency.

Technology Category

Application Category

📝 Abstract

Long-context inference remains challenging for large language models due to attention dilution and out-of-distribution degradation. Context selection mitigates this limitation by attending to a subset of key-value cache entries, yet most methods allocate a fixed context budget throughout decoding despite highly non-uniform token-level contextual demands. To address this issue, we propose Uncertainty-Triggered Adaptive Context Allocation (UT-ACA), an inference-time framework that dynamically adjusts the context window based on token-wise uncertainty. UT-ACA learns an uncertainty detector that combines semantic embeddings with logit-based confidence while accounting for uncertainty accumulation across decoding steps. When insufficient evidence is indicated, UT-ACA selectively rolls back, expands the context window, and regenerates the token with additional support. Experiments show that UT-ACA substantially reduces average context usage while preserving generation quality in long-context settings.

Problem

Research questions and friction points this paper is trying to address.

long-context inference

attention dilution

context selection

out-of-distribution degradation

context budget

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive context allocation

uncertainty detection

long-context inference

dynamic context window

token-level uncertainty

🔎 Similar Papers

Racing Thoughts: Explaining Large Language Model Contextualization Errors

2024-10-02arXiv.orgCitations: 1

Authors to Follow