Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

In long-horizon autoregressive video generation, static top-k/top-p sampling often degrades output quality due to the semantic sparsity and spatiotemporal redundancy of video tokens, introducing noise in low-uncertainty regions and accumulating errors in high-uncertainty regions. To address this, this work proposes entropy-guided k-Guard (ENkG) sampling—a training-free, model-agnostic strategy that dynamically adjusts the candidate set size based on the entropy of the token-level predictive distribution. Specifically, it narrows the candidate set under low entropy to preserve structural consistency and expands it under high entropy to mitigate error propagation. With negligible computational overhead and without altering the autoregressive framework, ENkG significantly improves both perceptual quality and temporal coherence of generated videos, outperforming existing static sampling approaches.

Technology Category

Application Category

📝 Abstract

Autoregressive (AR) architectures have achieved significant successes in LLMs, inspiring explorations for video generation. In LLMs, top-p/top-k sampling strategies work exceptionally well: language tokens have high semantic density and low redundancy, so a fixed size of token candidates already strikes a balance between semantic accuracy and generation diversity. In contrast, video tokens have low semantic density and high spatio-temporal redundancy. This mismatch makes static top-k/top-p strategies ineffective for video decoders: they either introduce unnecessary randomness for low-uncertainty regions (static backgrounds) or get stuck in early errors for high-uncertainty regions (foreground objects). Prediction errors will accumulate as more frames are generated and eventually severely degrade long-horizon quality. To address this, we propose Entropy-Guided k-Guard (ENkG) sampling, a simple yet effective strategy that adapts sampling to token-wise dispersion, quantified by the entropy of each token's predicted distribution. ENkG uses adaptive token candidate sizes: for low-entropy regions, it employs fewer candidates to suppress redundant noise and preserve structural integrity; for high-entropy regions, it uses more candidates to mitigate error compounding. ENkG is model-agnostic, training-free, and adds negligible overhead. Experiments demonstrate consistent improvements in perceptual quality and structural stability compared to static top-k/top-p strategies.

Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation

long-horizon generation

sampling strategy

spatio-temporal redundancy

error accumulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Guided Sampling

k-Guard

Autoregressive Video Generation