Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In long-horizon autoregressive video generation, static top-k/top-p sampling often degrades output quality due to the semantic sparsity and spatiotemporal redundancy of video tokens, introducing noise in low-uncertainty regions and accumulating errors in high-uncertainty regions. To address this, this work proposes entropy-guided k-Guard (ENkG) sampling—a training-free, model-agnostic strategy that dynamically adjusts the candidate set size based on the entropy of the token-level predictive distribution. Specifically, it narrows the candidate set under low entropy to preserve structural consistency and expands it under high entropy to mitigate error propagation. With negligible computational overhead and without altering the autoregressive framework, ENkG significantly improves both perceptual quality and temporal coherence of generated videos, outperforming existing static sampling approaches.

Technology Category

Application Category

📝 Abstract
Autoregressive (AR) architectures have achieved significant successes in LLMs, inspiring explorations for video generation. In LLMs, top-p/top-k sampling strategies work exceptionally well: language tokens have high semantic density and low redundancy, so a fixed size of token candidates already strikes a balance between semantic accuracy and generation diversity. In contrast, video tokens have low semantic density and high spatio-temporal redundancy. This mismatch makes static top-k/top-p strategies ineffective for video decoders: they either introduce unnecessary randomness for low-uncertainty regions (static backgrounds) or get stuck in early errors for high-uncertainty regions (foreground objects). Prediction errors will accumulate as more frames are generated and eventually severely degrade long-horizon quality. To address this, we propose Entropy-Guided k-Guard (ENkG) sampling, a simple yet effective strategy that adapts sampling to token-wise dispersion, quantified by the entropy of each token's predicted distribution. ENkG uses adaptive token candidate sizes: for low-entropy regions, it employs fewer candidates to suppress redundant noise and preserve structural integrity; for high-entropy regions, it uses more candidates to mitigate error compounding. ENkG is model-agnostic, training-free, and adds negligible overhead. Experiments demonstrate consistent improvements in perceptual quality and structural stability compared to static top-k/top-p strategies.
Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation
long-horizon generation
sampling strategy
spatio-temporal redundancy
error accumulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Guided Sampling
k-Guard
Autoregressive Video Generation
Adaptive Token Selection
Long-Horizon Generation
🔎 Similar Papers
No similar papers found.