HERO: Hierarchical Extrapolation and Refresh for Efficient World Models

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Generative world models rely on diffusion models, but their iterative sampling leads to low inference efficiency; existing acceleration methods—when directly adapted—often degrade output quality. This paper proposes a training-free hierarchical acceleration framework: at shallow layers, patch-level dynamic token recomputation mitigates feature drift; at deep layers, frequency-aware linear extrapolation skips attention and feed-forward network computations by directly predicting intermediate features. The method uncovers and exploits implicit feature coupling structures inherent in world models, remains compatible with FlashAttention, and incurs no additional measurement overhead. Evaluated on standard benchmarks, it achieves a 1.73× end-to-end speedup while preserving near-lossless image fidelity (PSNR degradation <0.3 dB), substantially outperforming state-of-the-art diffusion acceleration approaches.

Technology Category

Application Category

📝 Abstract

Generation-driven world models create immersive virtual environments but suffer slow inference due to the iterative nature of diffusion models. While recent advances have improved diffusion model efficiency, directly applying these techniques to world models introduces limitations such as quality degradation. In this paper, we present HERO, a training-free hierarchical acceleration framework tailored for efficient world models. Owing to the multi-modal nature of world models, we identify a feature coupling phenomenon, wherein shallow layers exhibit high temporal variability, while deeper layers yield more stable feature representations. Motivated by this, HERO adopts hierarchical strategies to accelerate inference: (i) In shallow layers, a patch-wise refresh mechanism efficiently selects tokens for recomputation. With patch-wise sampling and frequency-aware tracking, it avoids extra metric computation and remain compatible with FlashAttention. (ii) In deeper layers, a linear extrapolation scheme directly estimates intermediate features. This completely bypasses the computations in attention modules and feed-forward networks. Our experiments show that HERO achieves a 1.73$ imes$ speedup with minimal quality degradation, significantly outperforming existing diffusion acceleration methods.

Problem

Research questions and friction points this paper is trying to address.

Accelerates diffusion-based world models inference speed

Reduces quality degradation in hierarchical generation process

Addresses feature coupling in shallow and deep layers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical patch refresh for shallow layers

Linear extrapolation bypasses deep computations

Training-free framework accelerates diffusion models

🔎 Similar Papers

No similar papers found.

Authors to Follow